On Monday 12 July 2004 10:59, Tang, Hannah (NIH/NLM) wrote: > > Hi, Hello,
> I have two big text files, I need to read one line from first > file, write some information from this line to a new file, and search > second file to find lines with same control_id, and write more > information to the new file, I wrote in perl, but it tooks half day > to finish joining the two files. Do you have any suggest? > > Below are some of my code. > ==================================================================== > > #!/usr/bin/perl -w > #use IO::FILE; > #use strict 'subs'; > # > > $file1="file1.txt"; > $file2="file2.txt"; > > open (SOURCE, "$file1") > or die "can't open the $file1: $!"; > > while (<SOURCE>) { > $control_id = substr($_, 0, 22); > > open (SINK, ">>newFile.dat") > or die "can't open the newFile.dat: $!"; You don't need to open this file inside the loop. Open it once before the loop starts. > print SINK $control_id; > #write more to newFile.dat > > open (ADDSOURCE, "$file2") > or die "can't open the $file2: $!"; > > while (<ADDSOURCE>) { > if ($_ =~ /^$control_id/) { > print SINK substr($_, 31, 3); > #write more to newFile.dat > $weight = substr($_, 48, 7); > $totalWeight += $weight; > $_ = <ADDSOURCE>; > while ($_ =~ /^$control_id/) { > print SINK substr($_, 31, 3); > #write more to newFile.dat > $weight = substr($_, 48, 7); > $totalWeight += $weight; > $_ = <ADDSOURCE>; > }#end of while > print SINK "$totalWeight"; > seek(ADDSOURCE, 0, 2) > or die "Couldn't seek to the end: > $!\n"; > > }#end of if > }#end of while for ADDSOURCE You are doing way too much inside the while loop. This may not speed up your program but it will make it a lot easier to read. :-) while ( <ADDSOURCE> ) { next unless /^$control_id/; print SINK substr $_, 31, 3; #write more to newFile.dat $totalWeight += substr $_, 48, 7; } print SINK $totalWeight; > close(ADDSOURCE) or die "can't close $ADDSOURCE: $!\n"; > close(SINK) or die "can't close $SINK: $!\n"; > } #end of while for SOURCE > close(SOURCE) or die "can't close $SOURCE: $!\n"; Can you fit all of the control ids from "file1.txt" into an array or hash in memory? Perhaps a tied hash will help. #!/usr/bin/perl -w use strict; # UNTESTED !! my $file1 = 'file1.txt'; my $file2 = 'file2.txt'; open SOURCE, $file1 or die "can't open the $file1: $!"; my ( $order, %control_ids ); while ( <SOURCE> ) { $control_ids{ substr $_, 0, 22 } = { order => ++$order, field => [], # don't know what to call this? weight => 0, }; } close SOURCE or die "can't close $file1: $!\n"; open ADDSOURCE, $file2 or die "can't open the $file2: $!"; while ( <ADDSOURCE> ) { my $id = substr $_, 0, 22; next unless exists $control_ids{ $id }; push @{ $control_ids{ $id }{ field } }, substr $_, 31, 3; $control_ids{ $id }{ weight } += substr $_, 48, 7; } close ADDSOURCE or die "can't close $file2: $!\n"; open SINK, '>>newFile.dat' or die "can't open the newFile.dat: $!"; for my $id ( sort { $control_ids{ $a }{ order } <=> $control_ids{ $a }{ order } } keys %control_ids ) { print SINK $id, @{ $control_ids{ $id }{ field } }, $control_ids{ $id }{ weight }; } close SINK or die "can't close newFile.dat: $!\n"; __END__ John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>