On Wed, Mar 26, 2008 at 11:18 AM,  <[EMAIL PROTECTED]> wrote:
> I have two sorted files (one string per line).
>  [I'd also like to know how to sorvle this if the lists weren't sorted
>  (as complimented sets)].]
>  I want to output the List1 items not found in the List2 file.
>  grep is too slow.
>  diff gets stuck because list2 has millions of items.
>
>  for example:
>  List1.txt contains:
>  a
>  aa     <- only in List1.txt not in List2.txt
>  b
>  bb
>
>  List2.txt contains millions of items:
>  a
>  aaa
>  aaaa
>  b
>  bb
>  ...
>  zzzzzzz
>
>  Ideally:
>  > perlscr List1.txt List2.txt
>
>  would output:
>
>  aa
>
>
>  Any help is appreciated.
>

#!/usr/bin/perl

use strict;
use warnings;

my ($smallfile, $largefile) = @ARGV;

open my $small, "<", $smallfile
        or die "could not open $smallfile: $!";

my $max;
my %seen = map { $max = $_; $_ => 0 } <$small>;
close $small;

open my $large, "<", $largefile
        or die "could not open $largefile: $!";

while (<$large>) {
        last if $_ gt $max;
        $seen{$_}++ if exists $seen{$_};
}

print grep { not $seen{$_} } sort keys %seen;



-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to