On Wed, Mar 26, 2008 at 8:18 AM,  <[EMAIL PROTECTED]> wrote:

> I have two sorted files (one string per line).
>  [I'd also like to know how to sorvle this if the lists weren't sorted
>  (as complimented sets)].]
>  I want to output the List1 items not found in the List2 file.
>  grep is too slow.
>  diff gets stuck because list2 has millions of items.

If the lists aren't sorted, it's probably best to read the second list
(the list of filters) into a hash. But since they're sorted, and
because you have many filters, it's more efficient to read the files
in parallel.

My first draft of this program used this line to implement the inner loop:

    $current_filter = <FILTERS> while $item gt $current_filter;

... but then I realized  that the second file could run out of filters
before the first one runs out of data, so it had to become more
complex:

    #!/usr/bin/perl

    use strict;
    use warnings;

    die "huh?" unless @ARGV == 2;
    my($data_file, $filters) = @ARGV;

    open DATA_FILE, $data_file or die "Can't read '$data_file': $!";
    open FILTERS, $filters or die "Can't read '$filters': $!";

    my $current_filter = '';

    # outer loop reads a line at a time
  DATA_LINE:
    while (my $item = <DATA_FILE>) {

      # inner loop updates the filter, if needed
      # This inner loop would be just this line:
      ### $current_filter = <FILTERS> while $item gt $current_filter;
      # .... except that we have to allow for the filters to run out.
      while ($item gt $current_filter) {
        if (defined($current_filter = <FILTERS>)) {
          # a filter was read from the file: normal case
        } else {
          # No more filters; print everything else
          print $item;
          print while <DATA_FILE>;
          last DATA_LINE;
        }
      }

      # the inner loop has now updated $current_filter
      print $item unless $item eq $current_filter;
    }

Hope this helps!

--Tom Phoenix
Stonehenge Perl Training

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to