This is more a curiosity question. I have written a bash script which
reads a bzip2 compressed set of files. For each record in the file, it
writes the record into a file name based on the first two "words" in
the record and the "generation number" from the input file name. Do to
the extreme size of the input (47 files, each of which would be around
120 Gb to 180 Gb expanded or 23 to 27 million lines - very large).
Basically there are probably around 50 or so (don't know) possible
combinations of the "words". I'm wondering if I rewriting the script
into either Python or Perl (both basically interpreted) would be worth
my while. Or should I go with a compiler such as C/C++? Or, lastly, is
it basically irrelevant due to the extremely large number of records
and the minimal processing; which means that I/O will dominate the
application.

If you're interested, the bash script looks like:

#!/bin/bash
for i in irradu00.g*.bz2;do
        gen=${i#irradu00.}; # remove prefix
        gen=${gen%.bz2}; # remove suffix, leaving generation
        bzcat $i |\
        while read line;do
                fn=${line%% *} # remove all trailing characters after a space
                ft=${line:9:8} # get second word
                ft=${ft%% *} # and remove trailing spaces
                echo "${line}" >>${fn}.${ft}.${gen}.tx2;
        done;
done

If you're curious about the "set ${line}", I just couldn't figure out
a way to parse

--
Maranatha! <><
John McKown

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to