> -----Original Message-----
> From: Peter Cline [mailto:[EMAIL PROTECTED]]
> Sent: Friday, January 04, 2002 3:49 PM
> To: [EMAIL PROTECTED]
> Subject: Strange (from my perspective) regex behavior
> 
> 
> I am trying to extraxt some text from a file using a regular 
> expression.  It is not behaving as expected and am totally 
> perplexed as to why.
> Here is an excerpt of the text
> 
> 1. Top Story: Dynegy in Agreement to Get Enron Pipeline
> 2. M&A: Newmont-Normandy, Hewlett-Compaq, Pax TV, WorldCom
> 3. Investment Banking: Goldman, Sandler, Merrill Lynch
> 4. I.P.O.s/Offerings: Sirius Satellite Radio, Neuer Markt
> 5. Venture Capital: Lucent-Coller Capital, EM.TV
> 6. Private Equity: HSBC, Canada 3000, Edel Music, Kumho Tire
> 7. Legal: GE Capital Aviation, EchoStar-DirecTV
> 8. Correction: Daily Deal Echostar-DirecTV  Article
> 
> 
> /------------------advertisement--------------\
> 
> I want to extract the numbered list.
> 
> here is the regex I am using to do it:
> m!((\d\.\s\D+)+)/[-]+advertisement!
> 
> For some reason this starts matching at number 7.

That's because of the 3000 in item #6, which doesn't match \D, so
the start of item #7 is the leftmost point where it can start
matching.

>  If I eliminate 
> everything after / the regex matches from 1 to the / in item 4.

Again, the 3000 in item #6 is causing the problem. The / in item
4 is as far as it can go and still match.

> 
> I am totally perplexed as to why this is happening.  If 
> anyone has insite, 
> I would be most appreciative.

Why not just use:

   ($text) = m!(\d\..*)/-+advertisement!s;

Or, to get the lines into an array:

   @lines = m!(\d\..*)!g;

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to