From: "Johnson, Shaunn" <[EMAIL PROTECTED]>
> I have this bit of code (below) and I'm wondering 
> if there is a quicker way to remove 
> some odd-ball characters from very
> large text files (large would be about the
> 200M or so).
> 
> [snip code]
> 
> #!/usr/bin/perl
> 
> #$_ =~ s/\cM\n/\n/g;
> 
> while (<>) {
>    $_ =~ s/(\cM\n|\\|\~|\!|\@|\#|\$|\%|\^|\&|\*|\(|\))/\n/g;
>    print $_;
> }
> 
> [/snip code]
> 
> I want to add some variable to pass (and rename INPUT file)
> but before I do, I'd like to know if doing something like open()
> would be any faster than this. 

tr/// is quicker than s/// so if that is possible you should use 
that. Also your s/// looks a bit strange. You really want to replace 
any of those characters with a newline?

Also is there any reason to leave a \cM that's not followed by \n in 
the file? If there is not you could use

        $_ =~ [EMAIL PROTECTED]&*()}{\n};

Also ... if replacing any \cM is fine it would be more efficient to 
read the file in chunks instead of line by line:

        open my $IN, "< $filename"
                or die "Can't open $filename: $!\n"
        open my $OUT, "> $outfilename"
                or die "Can't create $outfilename: $!\n"
        while (read $IN, $buff, 10*1024) {
                $buff =~ [EMAIL PROTECTED]&*()}{\n};
                print $OUT $buff;
        }
        close $IN;
        close $OUT;

HTH, Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to