R (Chandra) Chandrasekhar wrote:
Dear Folks,

Hello,

This is a question about s///sg across lines from a file slurped in file mode.

I am trying to change occurrences of & into &amp;. As a minimal example, I used the contrived file below where single- and multi-line records are delimited by <...>. The only real-world text is a hyperlink from an actual web site.
--------
<Pebbles & Pelicans>
<Hogsworthy Tales of a Late Summer Afternoon>
<Everything News
worthy & Printable>
# This is a comment and should be ignored
<Rough &amp; Tumble>
<Pride & Prejudice>
<P&O Shipping Corporation>
<a href=http://www.amazon.com/s?ie=UTF8&tag=mozilla-20&index=blended&link%5Fcode=qs&field-keywords=Programming%20Perl&sourceid=Mozilla-search>
<This record should not be
modified.>
<Nor this.>
<This is a very long line with many intervening & symbols This is a very long line with many intervening & symbols This is a very long line with many intervening & symbols This is a very long line with many intervening & symbols This is a very long line with many intervening & symbols This is a very long line with many intervening & symbols This is a very long line with many intervening & symbols This is a very long line with many intervening & symbols>
<sudo apt-get update && sudo apt-get upgrade>
--------

I tried this script on the above file:

Very good.  Both data and code to work with.  Thanks.

--------
#!/usr/bin/perl
use warnings;
use diagnostics;
use strict;

my ($fh, $file, $data, $count, @record, $record);
my (@orig, $orig, @repl, $repl, $subs, @amper);

You shouldn't declare all your variables here. It is better to declare them in the smallest scope possible.


undef $/; # Slurp data in file mode
$file = shift;

my $file = shift;

open $fh, '<', $file or die "Cannot open $file: $!\n";

open my $fh, '<', $file or die "Cannot open $file: $!\n";

$data = <$fh>;

my $data = <$fh>;
close $fh;

$count = 0;

my $count = 0;

while ($data =~ m|<\s*?(.*?)\s*?>|gis)

I ran this using "use re 'debug';" and that expression is pretty inefficient. Also the /i option is for a case insensitive match but there are no characters in that pattern that have different cases. This would seem to be more efficient:

while ( $data =~ /<\s*([^<>]*)\s*>/g )


    {
    $count++;
    print "$count: $1\n";
    push @record, $1;
    }
close $fh;

$count = 0;
foreach $record (@record)

foreach my $record ( @record )

    {
    $count++;
    if (($record !~ m|&amp;|s) && ($record =~ m|&|s))

You are only choosing records that contain '&' but not '&amp;' but what if a record contained both '&' and '&amp;'?


        {
        push @orig, $record;
        push @amper, $count;
$subs = ($record =~ s|&|&amp;|gis); # Should be number of substitutions

The /s option only applies to the . meta-character and the /i option only applies to characters that have different upper and lower case representations. The pattern /&/ has neither of these characteristics. Also to match records that have both '&' and '&amp;' patterns use a negative look-ahead:

           my $subs = $record =~ s/&(?!amp;)/&amp;/g;


        if ($subs == 0) {warn "$count: No replacement made.\n";}
        else {print "$count: $subs replacement(s) made.\n";}
        push @repl, $record;
        }
    }

foreach $orig (@orig)

foreach my $orig ( @orig )

    {
    $subs = 0;
    $repl = shift @repl;
    $count = shift @amper;

      my $subs = 0;
      my $repl = shift @repl;
      my $count = shift @amper;

    $subs = ($data =~ s|$orig|$repl|gs);

You have a problem with this record:

<a href=http://www.amazon.com/s?ie=UTF8&tag=mozilla-20&index=blended&link%5Fcode=qs&field-keywords=Programming%20Perl&sourceid=Mozilla-search>

which is not working. It is not working because the string has regular expression meta-characters in it which don't match the literal data in $data. ('com/s?ie' will match either 'com/sie' or 'com/ie' but not 'com/s?ie') You have to use quotemeta to get it to match correctly:

      my $subs = $data =~ s/\Q$orig/$repl/g;


    if ($subs == 0) {warn "$count: No replacement made.\n";} # Why?
    else {print "$count: $subs replacement(s) made.\n";}
    }

open $fh, '>', "$file.new" or die "Cannot open $file: $!\n";
print $fh $data;
close $fh;



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to