[DotNetDevelopment] Re: Regex

Cerebrus Sat, 09 May 2009 01:55:58 -0700

For the record, it is usually very difficult to construct a Regex that
matches HTML strings because there are too many possibilities
(including that the HTML is not well formed) and is therefore not
recommended for HTML parsing.

Also, your Regex syntax is quite amiss. You should define character
sets in square brackets [] and backreferences in curved brackets ().
Only one out of the characters in the set are matched (unless you have
used a quantifier, so you do not need to use the alternation operator
( | ). Here's a sample Regex that matches text within <p>...</p> tags
(but does not allow any other tags within) :

\<[pP]([^<]*?)\</[pP]\>

When used against the following test data:

Matches - <p>This is a XHTML compliant paragraph</p>
Matches - <P>This is an HTML paragraph</P>
Does not match - <p>This is an <i>italicized paragraph</i></p>
Matches - <p style="font-weight:bold;">This is a bold paragraph</p>
Matches - <p>This is just another paragraph</p>

On May 8, 3:56 pm, Ibrahim <[email protected]> wrote:
> Hi
> I am trying to make reguler expression that retreive a string from
> html file
> the string that shall be retreived is the one between <P >  string may
> contain further tags and ..</P>
> how can I MAKE THIS waht's wrong with it ?? why it dont work
>
> <(p|P)[.*?>.*?][<p|P>]
> Regards

[DotNetDevelopment] Re: Regex

Reply via email to