On Wed, Mar 26, 2008 at 11:18 AM, <[EMAIL PROTECTED]> wrote:
> I have two sorted files (one string per line).
> [I'd also like to know how to sorvle this if the lists weren't sorted
> (as complimented sets)].]
> I want to output the List1 items not found in the List2 file.
> grep is too slow.
> diff gets stuck because list2 has millions of items.
>
> for example:
> List1.txt contains:
> a
> aa <- only in List1.txt not in List2.txt
> b
> bb
>
> List2.txt contains millions of items:
> a
> aaa
> aaaa
> b
> bb
> ...
> zzzzzzz
>
> Ideally:
> > perlscr List1.txt List2.txt
>
> would output:
>
> aa
>
>
> Any help is appreciated.
>
#!/usr/bin/perl
use strict;
use warnings;
my ($smallfile, $largefile) = @ARGV;
open my $small, "<", $smallfile
or die "could not open $smallfile: $!";
my $max;
my %seen = map { $max = $_; $_ => 0 } <$small>;
close $small;
open my $large, "<", $largefile
or die "could not open $largefile: $!";
while (<$large>) {
last if $_ gt $max;
$seen{$_}++ if exists $seen{$_};
}
print grep { not $seen{$_} } sort keys %seen;
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/