On Wed, Mar 26, 2008 at 11:18 AM, <[EMAIL PROTECTED]> wrote: > I have two sorted files (one string per line). > [I'd also like to know how to sorvle this if the lists weren't sorted > (as complimented sets)].] > I want to output the List1 items not found in the List2 file. > grep is too slow. > diff gets stuck because list2 has millions of items. > > for example: > List1.txt contains: > a > aa <- only in List1.txt not in List2.txt > b > bb > > List2.txt contains millions of items: > a > aaa > aaaa > b > bb > ... > zzzzzzz > > Ideally: > > perlscr List1.txt List2.txt > > would output: > > aa > > > Any help is appreciated. >
#!/usr/bin/perl use strict; use warnings; my ($smallfile, $largefile) = @ARGV; open my $small, "<", $smallfile or die "could not open $smallfile: $!"; my $max; my %seen = map { $max = $_; $_ => 0 } <$small>; close $small; open my $large, "<", $largefile or die "could not open $largefile: $!"; while (<$large>) { last if $_ gt $max; $seen{$_}++ if exists $seen{$_}; } print grep { not $seen{$_} } sort keys %seen; -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/