Re: Multiple matching question

John W. Krahn Thu, 10 Aug 2006 13:36:57 -0700

Roman Daszczyszak wrote:
> Hello all,

Hello,


> I have several text files with a few thousand contacts in each, and I
> am trying to pull out all the contacts from certain email domains
> (about 15 of them).  I wrote a script that loops through each file,
> then loops through matching each domain to the line and writes the
> results to two files, one for matches, one for non-matches.
> 
> I am just curious if there is a way to match all the domains in turn,
> without having a foreach looping through them?

Yes there is but it is usually considered slower than using a for loop.


> Here's my code:
> #!/perl/bin/perl
> use strict;
> use warnings;
> 
> my $program_time = time();

This variable is already provided by Perl, it is called $^T.


> die "SYNTAX: strip_email_addresses.pl FILE1 FILE2 .. FILE(N)\n" unless
> (@ARGV);
> my $domain_filename = "intel_addresses.txt";
> my @email_domains;
> 
> open(DOMAINS, "<$domain_filename") or die "Cannot open $domain_filename:
> $!\n";
> chomp(@email_domains = <DOMAINS>);
> LINE: while (<>)
> {
>    my $filename = $ARGV;
>    $filename =~ s/\.csv//gi;

You are saying that you want to remove ALL occurences of /\.csv/i from the
file name?  If you just want to remove /\.csv/i at the end of the file name
(the file name extension) you should anchor the pattern:

    $filename =~ s/\.csv\z//i;


>    open(FOUND, ">>${filename}_match.csv") or die "Cannot open
> ${filename}_match.csv\n";
>    open(NOTFOUND, ">>${filename}_nomatch.csv") or die "Cannot open
> ${filename}_nomatch.csv\n";
> 
>    foreach my $domain (@email_domains)
>    {
>        if (m/$domain/i)
>        {
>            print(FOUND $_);
>            next LINE;
>        }
>    }
>    print(NOTFOUND $_);
> }
> print("Run time: ",time() - $program_time,"\n");
> ---------------------------------------------------------------------------
> 
> Additionally, does anyone know of a better way to open the results
> files, keeping the practice of making two files for each original,
> without having to reopen the file on each iteration of the while loop?
> Does reopening the file cause a performance hit each open?

You probably want something like:

#!/perl/bin/perl
use strict;
use warnings;

die "SYNTAX: strip_email_addresses.pl FILE1 FILE2 .. FILE(N)\n"
    unless @ARGV;
my $domain_filename = 'intel_addresses.txt';

open DOMAINS, '<', $domain_filename
    or die "Cannot open $domain_filename: $!\n";
my $email_domains = join '|', map { chomp; quotemeta } <DOMAINS>;

my $domains = qr/$email_domains/i;

while ( <> ) {
    if ( $. == 1 ) {   # only open once at beginning
        ( my $filename = $ARGV ) =~ s/\.csv\z//i;

        open FOUND,    '>', "${filename}_match.csv"
            or die "Cannot open ${filename}_match.csv: $!\n";
        open NOTFOUND, '>', "${filename}_nomatch.csv"
            or die "Cannot open ${filename}_nomatch.csv: $!\n";
        }

    print { /$domains/ ? FOUND : NOTFOUND } $_;

    close ARGV if eof;  # must close so $. will work correctly
    }

print 'Run time: ', time() - $^T, "\n";





John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Multiple matching question

Reply via email to