On Wed, Mar 26, 2008 at 8:18 AM, <[EMAIL PROTECTED]> wrote: > I have two sorted files (one string per line). > [I'd also like to know how to sorvle this if the lists weren't sorted > (as complimented sets)].] > I want to output the List1 items not found in the List2 file. > grep is too slow. > diff gets stuck because list2 has millions of items.
If the lists aren't sorted, it's probably best to read the second list (the list of filters) into a hash. But since they're sorted, and because you have many filters, it's more efficient to read the files in parallel. My first draft of this program used this line to implement the inner loop: $current_filter = <FILTERS> while $item gt $current_filter; ... but then I realized that the second file could run out of filters before the first one runs out of data, so it had to become more complex: #!/usr/bin/perl use strict; use warnings; die "huh?" unless @ARGV == 2; my($data_file, $filters) = @ARGV; open DATA_FILE, $data_file or die "Can't read '$data_file': $!"; open FILTERS, $filters or die "Can't read '$filters': $!"; my $current_filter = ''; # outer loop reads a line at a time DATA_LINE: while (my $item = <DATA_FILE>) { # inner loop updates the filter, if needed # This inner loop would be just this line: ### $current_filter = <FILTERS> while $item gt $current_filter; # .... except that we have to allow for the filters to run out. while ($item gt $current_filter) { if (defined($current_filter = <FILTERS>)) { # a filter was read from the file: normal case } else { # No more filters; print everything else print $item; print while <DATA_FILE>; last DATA_LINE; } } # the inner loop has now updated $current_filter print $item unless $item eq $current_filter; } Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/