Re: Multiple matching question

Rob Dixon Thu, 10 Aug 2006 09:11:19 -0700

Roman Daszczyszak wrote:
>
> I have several text files with a few thousand contacts in each, and I
> am trying to pull out all the contacts from certain email domains
> (about 15 of them).  I wrote a script that loops through each file,
> then loops through matching each domain to the line and writes the
> results to two files, one for matches, one for non-matches.
>
> I am just curious if there is a way to match all the domains in turn,
> without having a foreach looping through them?
>
> Here's my code:
> #!/perl/bin/perl
> use strict;
> use warnings;
>
> my $program_time = time();
> die "SYNTAX: strip_email_addresses.pl FILE1 FILE2 .. FILE(N)\n" unless
> (@ARGV);
> my $domain_filename = "intel_addresses.txt";
> my @email_domains;
>
> open(DOMAINS, "<$domain_filename") or die "Cannot open $domain_filename:
> $!\n";
> chomp(@email_domains = <DOMAINS>);
> LINE: while (<>)
> {
>    my $filename = $ARGV;
>    $filename =~ s/\.csv//gi;
>    open(FOUND, ">>${filename}_match.csv") or die "Cannot open
> ${filename}_match.csv\n";
>    open(NOTFOUND, ">>${filename}_nomatch.csv") or die "Cannot open
> ${filename}_nomatch.csv\n";
>
>    foreach my $domain (@email_domains)
>    {
>        if (m/$domain/i)
>        {
>            print(FOUND $_);
>            next LINE;
>        }
>    }
>    print(NOTFOUND $_);
> }
> print("Run time: ",time() - $program_time,"\n");
> ---------------------------------------------------------------------------
>
> Additionally, does anyone know of a better way to open the results
> files, keeping the practice of making two files for each original,
> without having to reopen the file on each iteration of the while loop?
> Does reopening the file cause a performance hit each open?


Hi Roman

This is a quick post, sorry, I have to be somewhere. You can build a regex from
the list of domains by joining them with a pipe. Also, I would stick with
opening the files in the loop, but only open the one you need. Take a look at
this code for some ideas. It's untested but compiles and a quick scan picked up
no errors.

HTH,

Rob


use strict;
use warnings;

my $program_time = time;
die "SYNTAX: strip_email_addresses.pl FILE1 FILE2 .. FILE(N)\n" unless @ARGV;

my $domain_filename = 'intel_addresses.txt';
my $domain_regex = do {
  open my $domains, $domain_filename or die "Cannot open $domain_filename: $!";
  chomp(my @domains = <$domains>);
  join '|', @domains;
};

while (<>) {

  my $filename = $ARGV;
  $filename =~ s/\.csv//gi;

  my $result_file = /$domain_regex/ ? "${filename}_match.csv" : "${filename}
  _nomatch.csv";
  open my $fh, '>>', $result_file or die "Cannot open $result_file";
  print $fh $_;
  close $fh;
}

print("Run time: ",time() - $program_time,"\n");

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Multiple matching question

Reply via email to