RE: Strange (from my perspective) regex behavior

SathishDuraisamy Fri, 04 Jan 2002 14:23:23 -0800

I am trying to extraxt some text from a file using a regular 
expression.  It is not behaving as expected and am totally perplexed as to
why.
Here is an excerpt of the text


1. Top Story: Dynegy in Agreement to Get Enron Pipeline
2. M&A: Newmont-Normandy, Hewlett-Compaq, Pax TV, WorldCom
3. Investment Banking: Goldman, Sandler, Merrill Lynch
4. I.P.O.s/Offerings: Sirius Satellite Radio, Neuer Markt
5. Venture Capital: Lucent-Coller Capital, EM.TV
6. Private Equity: HSBC, Canada 3000, Edel Music, Kumho Tire
7. Legal: GE Capital Aviation, EchoStar-DirecTV
8. Correction: Daily Deal Echostar-DirecTV  Article


/------------------advertisement--------------\

I want to extract the numbered list.

here is the regex I am using to do it:
m!((\d\.\s\D+)+)/[-]+advertisement!

For some reason this starts matching at number 7.

> The number 3000 doesn't cope with the regex \D. So PERL 
> begins(re-begins) its matching from the next '\d\.' which is "7. ....".

  If I eliminate 
everything after / the regex matches from 1 to the / in item 4.

> I couldn't understand whether you erased the line
> "/------------------advertisement--------------\"
> or everything after 4. "I.P.O.s/".  But anyhow, I think the better way 
> would be to read these lines in a loop instead of using a single
> regex.
> Something like,
> @ad_lines = ();
> foreach ( @lines ) {
>     push( @ad_lines, $_ ) if /\d\./;
      last if /\/\-+advertisement/;
> }


[Sathish]

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Strange (from my perspective) regex behavior

Reply via email to