From: "Johnson, Shaunn" <[EMAIL PROTECTED]> > I have this bit of code (below) and I'm wondering > if there is a quicker way to remove > some odd-ball characters from very > large text files (large would be about the > 200M or so). > > [snip code] > > #!/usr/bin/perl > > #$_ =~ s/\cM\n/\n/g; > > while (<>) { > $_ =~ s/(\cM\n|\\|\~|\!|\@|\#|\$|\%|\^|\&|\*|\(|\))/\n/g; > print $_; > } > > [/snip code] > > I want to add some variable to pass (and rename INPUT file) > but before I do, I'd like to know if doing something like open() > would be any faster than this.
tr/// is quicker than s/// so if that is possible you should use that. Also your s/// looks a bit strange. You really want to replace any of those characters with a newline? Also is there any reason to leave a \cM that's not followed by \n in the file? If there is not you could use $_ =~ [EMAIL PROTECTED]&*()}{\n}; Also ... if replacing any \cM is fine it would be more efficient to read the file in chunks instead of line by line: open my $IN, "< $filename" or die "Can't open $filename: $!\n" open my $OUT, "> $outfilename" or die "Can't create $outfilename: $!\n" while (read $IN, $buff, 10*1024) { $buff =~ [EMAIL PROTECTED]&*()}{\n}; print $OUT $buff; } close $IN; close $OUT; HTH, Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]