Roman Daszczyszak wrote: > > I have several text files with a few thousand contacts in each, and I > am trying to pull out all the contacts from certain email domains > (about 15 of them). I wrote a script that loops through each file, > then loops through matching each domain to the line and writes the > results to two files, one for matches, one for non-matches. > > I am just curious if there is a way to match all the domains in turn, > without having a foreach looping through them? > > Here's my code: > #!/perl/bin/perl > use strict; > use warnings; > > my $program_time = time(); > die "SYNTAX: strip_email_addresses.pl FILE1 FILE2 .. FILE(N)\n" unless > (@ARGV); > my $domain_filename = "intel_addresses.txt"; > my @email_domains; > > open(DOMAINS, "<$domain_filename") or die "Cannot open $domain_filename: > $!\n"; > chomp(@email_domains = <DOMAINS>); > LINE: while (<>) > { > my $filename = $ARGV; > $filename =~ s/\.csv//gi; > open(FOUND, ">>${filename}_match.csv") or die "Cannot open > ${filename}_match.csv\n"; > open(NOTFOUND, ">>${filename}_nomatch.csv") or die "Cannot open > ${filename}_nomatch.csv\n"; > > foreach my $domain (@email_domains) > { > if (m/$domain/i) > { > print(FOUND $_); > next LINE; > } > } > print(NOTFOUND $_); > } > print("Run time: ",time() - $program_time,"\n"); > --------------------------------------------------------------------------- > > Additionally, does anyone know of a better way to open the results > files, keeping the practice of making two files for each original, > without having to reopen the file on each iteration of the while loop? > Does reopening the file cause a performance hit each open?
Hi Roman This is a quick post, sorry, I have to be somewhere. You can build a regex from the list of domains by joining them with a pipe. Also, I would stick with opening the files in the loop, but only open the one you need. Take a look at this code for some ideas. It's untested but compiles and a quick scan picked up no errors. HTH, Rob use strict; use warnings; my $program_time = time; die "SYNTAX: strip_email_addresses.pl FILE1 FILE2 .. FILE(N)\n" unless @ARGV; my $domain_filename = 'intel_addresses.txt'; my $domain_regex = do { open my $domains, $domain_filename or die "Cannot open $domain_filename: $!"; chomp(my @domains = <$domains>); join '|', @domains; }; while (<>) { my $filename = $ARGV; $filename =~ s/\.csv//gi; my $result_file = /$domain_regex/ ? "${filename}_match.csv" : "${filename} _nomatch.csv"; open my $fh, '>>', $result_file or die "Cannot open $result_file"; print $fh $_; close $fh; } print("Run time: ",time() - $program_time,"\n"); -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>