I am trying to extraxt some text from a file using a regular expression. It is not behaving as expected and am totally perplexed as to why. Here is an excerpt of the text
1. Top Story: Dynegy in Agreement to Get Enron Pipeline 2. M&A: Newmont-Normandy, Hewlett-Compaq, Pax TV, WorldCom 3. Investment Banking: Goldman, Sandler, Merrill Lynch 4. I.P.O.s/Offerings: Sirius Satellite Radio, Neuer Markt 5. Venture Capital: Lucent-Coller Capital, EM.TV 6. Private Equity: HSBC, Canada 3000, Edel Music, Kumho Tire 7. Legal: GE Capital Aviation, EchoStar-DirecTV 8. Correction: Daily Deal Echostar-DirecTV Article /------------------advertisement--------------\ I want to extract the numbered list. here is the regex I am using to do it: m!((\d\.\s\D+)+)/[-]+advertisement! For some reason this starts matching at number 7. > The number 3000 doesn't cope with the regex \D. So PERL > begins(re-begins) its matching from the next '\d\.' which is "7. ....". If I eliminate everything after / the regex matches from 1 to the / in item 4. > I couldn't understand whether you erased the line > "/------------------advertisement--------------\" > or everything after 4. "I.P.O.s/". But anyhow, I think the better way > would be to read these lines in a loop instead of using a single > regex. > Something like, > @ad_lines = (); > foreach ( @lines ) { > push( @ad_lines, $_ ) if /\d\./; last if /\/\-+advertisement/; > } [Sathish] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]