On Tue, 1 Oct 2002, Patricia Hinman wrote:

> I wanted to do this without completely loading the
> page into an array and then checking for duplicates.

The only case the entire file is read into memory is when all the lines in 
the file are unique. But using a hash reduces the number of comparisons.

> 
> So I just inserted some flags into my while statement.
> 
> #!/usr/bin/perl -w
> $dupli_file = <STDIN>;
> chomp ($dupli_file );
> $res_file = <STDIN>;
> chomp ($res_file  );
> open (DUPLI, "./$dupli_file") or die "Failed to open
> $dupli_file:\n$!";
> my @fil1;my$flag;
> while (<DUPLI>) {
>   #this sets an initial value so the foreach is
> evaluated
>   if(!$flag){push(@fil1,$_);print "first push :
> $_\n";$flag++}

You can actually do away with the $flag variable by reading the first line
before the while loop and pushing it into @fil1.

> #allow it to push if $pushit isn't reset to 0
> my$pushit = 1;
>   foreach $line (@fil1){
>     if($_ eq $line){$pushit =0;last;}
>   }

In the worst case (there are no duplicate lines) this will compare each 
line with all the other lines in the file.
This is a O(n^2) soln where n is the number of lines in the file.

The combination of flags and an array can be replaced by a hash. 
Also a hash provides a O(1) lookup for 'is it a duplicate' operation.

In principle this is very similar to 
perldoc -q 'How can I remove duplicate elements from a list or array?'

> unless($flag == 0){if($pushit){push(@fil1,$_);print
> "pushed $_\n"}}
> }
> 
> close(DUPLI);
> print "RESULT\n";
> open (RESULT, ">./$res_file") or die "Failed to open
> $res_file for writing:\n $!";
> print RESULT @fil1;
> close (RESULT);
> 
> __END__
> 

> > A hash is more suited for your job
> > #!/usr/bin/perl -w
> > use strict;
> > 
> > chomp (my $dupli_file = <STDIN>);
> > chomp (my $res_file = <STDIN>);
> > 
> > open (DUPLI, $dupli_file) or die "Failed to open
> > $dupli_file: $!\n";
> > open (RESULT, ">$res_file") or die "Failed to open
> > $res_file for writing: $!\n";
> > 
> > my %res_hash;
> > while (<DUPLI>) {
> >     chomp;

Missed the empty line check here too
next if (/^$/);

> >     unless ($res_hash{$_}++) {
> >         print RESULT "$_\n";
> >     }
> > }
> > close (DUPLI);
> > close (RESULT);



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to