Zhang Weiwu wrote:
> Hi.
>
> I got a datasheet from my colleague in MS Excel format and I intend to
> process that file with my awk/sed knowledge. The problem is: he sent me
> two Excel files each with 2134 records, in fact there should be only one
> excel file with 2134 rows and 295 columns, but MS Excel can only handle
> 256 data columns, so he split the datasheet vertically so he can manage
> to send to me.
>
> Now I saved both file to tab-separated-value format, how do I join them?
>
> I could have used join(1) but that require a join field, an ID of some
> sort. I think of this:
>
> $ grep -n '' left.tsv | sed 's/:/\t/'> left.forjoin
> $ grep -n '' right.tsv | sed 's/:/\t/'> right.forjoin
> $ join -t " " left.forjoin right.forjoin > result.tsv
> (note that for join's -t parameter somehow I need to manage to get a tab
> between the quotes)
>
> Yes I achieved what I want, but that looks complex. Is there a simpler
> way? Thanks in advance.
>
> I know OpenOffice 3.0 can handle up to 1024 data columns. It's difficult
> to convince anyone to switch to OOO because here in China MS Office
> costs only 0$. I also could use OOO3.0 for doing the join but I wish to
> know the commandline way:)
>
Got perl?
#!/usr/bin/perl
if($#ARGV < 1) {
print "Arguments: <file1> <file2>\n";
exit(1);
}
open(FIRSTFILE, $ARGV[0]);
open(SECONDFILE, $ARGV[1]);
@first = <FIRSTFILE>;
@second = <SECONDFILE>;
$i = 0;
for($i = 0;$i < 2; $i++) {
$tmp1 = $first[$i];
$tmp1 =~ s/\n//g;
$tmp2 = $second[$i];
$tmp2 =~ s/\n//g;
$str = $tmp1 . "\t" . $tmp2 . "\n";
print $str;
}
close(FIRSTFILE);
close(SECONDFILE);
This is likely not the best or fastest way to do it, and I don't have a
dataset as large as yours readily available for testing, but it seems to
work.
-Tim
--
[email protected] mailing list