Re: Help with pattern matching

John W. Krahn Fri, 02 Apr 2004 15:12:31 -0800

A Lukaszewski wrote:
> 
> Greetings all,

Hello,


> I have a comma-delimited lexical file with four fields: line number, the
> first part of a word, the second part of a word, and the word combined.
>   The first and fourth fields are only for reference.  The program I am
> developing is very simple.  If field two and field three both have
> accents in them, then print the line to an output file.
> 
> The heavily-commented program is below.  Thus far, all I get is an exact
> replica of the input file.  In addition to a plain binding operator of
> '=~ //', I have also tried explicit matching (m//) and regex (qr//).
> 
> #!/usr/bin/perl
> 
> #############################################################
> #############################################################
> # A PROGRAM TO READ THE SUB-WORD HEADERS OF A               #
> #              COMMA-DELIMITED FILE                         #
> # AND DETERMINE WHICH LINES HAVE MULTIPLE ACCENTS           #
> #############################################################
> #############################################################
> 
> use strict;

If you had had warnings enabled as well as strict you might have found
your problem a lot sooner.  :-)

use warnings;


> ###################################
> # OPEN THE INPUT AND OUTPUT FILES #
> ###################################
> 
> my ($file, $outfile);
> 
> $file    = 'y.csv' ;
> # Name the input file
> $outfile = 'y.res';
> # Name the output file

In Perl you usually declare your variables where you first use them and
your comments provide no useful additional information.

my $file    = 'y.csv';
my $outfile = 'y.res';


> open(INFO, "$file"  ) or die "Cannot open $file:$!\n";
> # Open the input file or report failure
> open(OUT, ">>$outfile") or die "Cannot open file y.res!\n";
> # Open the output file
> 
> ########################################
> # INITIALIZATION OF SCALARS AND ARRAYS #
> ########################################
> 
> my $line;             # = scalar by which program steps through data
> my $fieldEval1;       # = holding scalar for evaluating whether the
>                        # first half of the word has an accent in it
> my $fieldEval2;       # = holding scalar for evaluating whether the
>                        # second half of the word has an accent in it
> my @field;            # = holding array for the split line

You should declare these variables where you use them to limit their
scope.


> #######################################################
> # FOREACH CONTROL TO READ THE INPUT FILE LINE BY LINE #
> #   AND MANIPULATE THE DESIRED DATA TO AN OUTPUT FILE #
> #######################################################
> 
> foreach $line (<INFO>) {

foreach my $line ( <INFO> ) {

But you should really be using a while loop to read from files.  foreach
and for create a list in memory which means that the whole file will
have to be read before processing starts.

while ( my $line = <INFO> ) {


> # Assign the contents of the input file to $line one line at time for
> # evaluation.
>         chomp ($line);               # remove input field separator
>         next unless $line;           # skip blank lines
>         @field = split /,/, $line;   # Read each line as four fields split by
> commas
> 
> # Assign the second field to an evaluation scalar
>         $fieldEval1 = $field[1];
> # Assign the third field to an evaluation scalar
>         $fieldEval2 = $field[2];

You can assign to $fieldEval1 and $fieldEval2 directly from the split:

my ( undef, $fieldEval1, $fieldEval2 ) = split /,/, $line;

But it doesn't look like you are using those variables later?


> # Test whether BOTH the second or third fields have accents in them
> # Accents are represented by the following characters: k K c ; ' [ { ] }
> # \ and |.
>         if ({$field[1] =~ /[kKc;'\[\{\]\}\\\|]/} && {$field[2] =~
> /[kKc;\'\[\{\]\}\\\|]/}) {

Your problem is in this line (which warnings would have complained
about.)  The braces {} around the pattern matches are creating an
anonymous hash which returns a reference to that hash which is always
true in a boolean context which means that the expression is always
true.

if ( $fieldEval1 =~ /[][{};'\\|kKc]/ && $fieldEval2 =~ /][{};'\\|kKc/ )
{


Or you could probably simplify it like this:

my ( $fieldEval ) = $line =~ /^[^,]+,([^,]+,[^,]+)/;

if ( $fieldEval =~ /[][{};'\\|kKc]/ ) {


>             print OUT "$line\n";  # If so, print the line to file
>         }
> }
> 
> close (OUT);          # Close the output file
> close(INFO) ;         # Close the input file
> __END__



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Help with pattern matching

Reply via email to