I had the following task: Open a file, read it and merge all pairs of lines containing a certain number of tabs. Example:
Blablabla abc cab bca 123 453 756 Blablabla Blablabla
Here, lines 2 and three should be merged, while the other lines should remain untouched. Expected result:
Blablabla abc 123 cab 453 bca 756 Blablabla Blablabla
While I managed to get this done, I doubt that I found a good (fast) solution. So before I move on to the large files which have to be processed, I'd like to get your input for a better solution.
This is how I did it:
#!/usr/bin/perl -w
use strict;
my (@merge_one, @merge_two, @merge_three);
open (FILE, "file.txt") or die "Cannot open the input file";
my @input_file = <FILE>;
foreach (0..$#input_file) { chomp $input_file[$_]; my $next = $_+ 1; chomp $input_file[$next] if $input_file[$next]; if ($input_file[$_] =~ m/\t/ && $input_file[$next] =~ m/\t/) { @merge_one = split /\t/, $input_file[$_]; @merge_two = split /\t/, $input_file[$next]; for (0..$#merge_two) { $merge_three[$_] = $merge_two[$_] . " " . $merge_one[$_]; } $input_file[$_] = join "\n", @merge_three; print $input_file[$_], "\n\n"; $input_file[$next] = ''; (@merge_one, @merge_two, @merge_three) = (); } }
my $output = join "\n", @input_file;
open (OUTFILE, ">input_file2.txt"); print OUTFILE $output;
Not bad IMO.
One thing that would be an improvement is to not read the whole file
into memory, but instead process it line by line. The example below requires two tabs for merging:
open my $infile, 'file.txt' or die "Can't open ... $!"; open my $outfile, '> input_file2.txt' or die "Can't open ... $!"; my @pairs;
sub merge { my $ref = shift; my @merged; while ( my $line = shift @$ref ) { chomp $line; my @tmp = split /\t/, $line; push @{ $merged[$_] }, $tmp[$_] for 0..$#tmp; } @merged }
while (<$infile>) { if ( tr/\t// == 2 and @pairs <= 1 ) { push @pairs, $_; } elsif ( @pairs == 1 ) { print $outfile shift @pairs; print $outfile $_; } else { print $outfile "@$_\n" for merge( [EMAIL PROTECTED] ); print $outfile $_; } } print $outfile "@$_\n" for merge( [EMAIL PROTECTED] );
-- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>