Re: Need help with multi-line regex with DOS line terminations

Gunnar Hjalmarsson Tue, 08 Jan 2008 04:25:17 -0800

Zembower, Kevin wrote:

I'm trying to process a DOS text file (with DOS CRLF line terminations)
and translate from one database export format into another database
input format.


And you are obviously doing it on a UNIX-like platform.

I've pasted in my program and a short example file of data
at the end of this message.


500+ lines is not very short IMO. ;-)

I think my problem is caused by the DOS line terminations and the way
I'm trying to handle them in my overall program.


What makes you think that?

I'd advise you to start working with a data-set with \012 newlines. Onlywhen that works as expected, you should deal with the fact that thenewlines of the data are represented by \015\012.

My problem is lines that look like this:
AD  - Department of Family and Community Medicine, College of Medicine,
King Faisal^M$
      University, Dammam, Saudi Arabia. [EMAIL PROTECTED]

(This should be just two lines; my email program is wrapping them.) I'm
trying to capture everything from the first 'AD  - ' to the next set of
four characters that are either upper-case letters or blanks, followed
by a dash and a blank. I tried to use this:
   my($ad) = /AD  - (.*?)\015\012([A-Z]|\s){4}-\s/;

This regex only matches the one address in my sample data that consists
of just one line. It fails to match anything for the multi-line
addresses.

You seem to want the /s modifier, to make "." match also newlines. Readabout the /s modifier in "perldoc perlre".


--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Need help with multi-line regex with DOS line terminations

Reply via email to