On Jun 19, 2006, at 22:45, Anthony Ettinger wrote:
# order matters
$raw_text =~ s/\015\012/\n/g;
$raw_text =~ s/\012/\n/g unless "\n" eq "\012";
$raw_text =~ s/\015/\n/g unless "\n" eq "\015";
Does it make any difference if I use s/\cM\cJ/cJ/ vs. s/\015\012/\n/
g ?
The regexp is OK, the replacement string is not, because \cJ is not
necessarily eq "\n". The latter is portable, the former is not.
Since the newline convention is not necessarily the one in the
runtime platform you cannot write a line-oriented script. If files
are too big to slurp then you'd work on chunks, but need to check by
hand whether a CRLF has been cut in the middle.
I'm reading each line in a while loop, so it should work fine on a
large file?
The while loops over lines ***as long as they are encoded using the
conventions of the runtime platform***. The diamond operator uses $/
as separator, which in turn is "\n" by default. Since the purpose of
your script is to deal with *any* newline convention, in general a
while loop like
while (my $line = <$fh>) { ... }
looks suspicious. The variable should be called $chunk_of_text,
instead of $line. You don't know whether you'll get a "line".
Suspicious, may signal the programmer does not fully understand
what's going on.
For instance, TextWrangler is known to use old-Mac conventions by
default (last time I checked). If you read a file like that with that
while in either Unix or Windows you'll slurp the entire file in a
single iteration. That is, $line will contain the whole file.
In general, to be robust to newline conventions you need to to some
munging by hand before using regular, portable line-oriented idioms.
-- fxn
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>