> -----Original Message-----
> From: Peter Cline [mailto:[EMAIL PROTECTED]]
> Sent: Friday, January 04, 2002 3:49 PM
> To: [EMAIL PROTECTED]
> Subject: Strange (from my perspective) regex behavior
>
>
> I am trying to extraxt some text from a file using a regular
> expression. It is not behaving as expected and am totally
> perplexed as to why.
> Here is an excerpt of the text
>
> 1. Top Story: Dynegy in Agreement to Get Enron Pipeline
> 2. M&A: Newmont-Normandy, Hewlett-Compaq, Pax TV, WorldCom
> 3. Investment Banking: Goldman, Sandler, Merrill Lynch
> 4. I.P.O.s/Offerings: Sirius Satellite Radio, Neuer Markt
> 5. Venture Capital: Lucent-Coller Capital, EM.TV
> 6. Private Equity: HSBC, Canada 3000, Edel Music, Kumho Tire
> 7. Legal: GE Capital Aviation, EchoStar-DirecTV
> 8. Correction: Daily Deal Echostar-DirecTV Article
>
>
> /------------------advertisement--------------\
>
> I want to extract the numbered list.
>
> here is the regex I am using to do it:
> m!((\d\.\s\D+)+)/[-]+advertisement!
>
> For some reason this starts matching at number 7.
That's because of the 3000 in item #6, which doesn't match \D, so
the start of item #7 is the leftmost point where it can start
matching.
> If I eliminate
> everything after / the regex matches from 1 to the / in item 4.
Again, the 3000 in item #6 is causing the problem. The / in item
4 is as far as it can go and still match.
>
> I am totally perplexed as to why this is happening. If
> anyone has insite,
> I would be most appreciative.
Why not just use:
($text) = m!(\d\..*)/-+advertisement!s;
Or, to get the lines into an array:
@lines = m!(\d\..*)!g;
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]