Cuteness, with the exception of small kittens, is in the eye of the beholder.
However, you'll want to leave that whole text in your script as a comment
because it isn't obvious what that pile is or why you did it. At least it sure
wasn't obvious to me.
It is, however, efficient. I tried a couple of alternatives, using
heavier-duty scripting languages to try to save time by taking only one look at
awk '{print | "bzip2 -9 > words.bz2"}
/z/ {print > "words.has-z"}
END {print NR}' /usr/share/dict/words
Took about three times as long as the shell script you've got here. The Perl
script
#! /usr/bin/perl
open(FH, ">words.has-z.pl") || die "Could not open grep file";
open(BZ, "|bzip2 -9 > words.bz2") || die "Could not open pipe";
while (<>) {
print FH $_
if (/z/);
print BZ $_;
}
print "$. lines read\n";
run with
./words.pl /usr/share/dict/words
actually was indistinguishable from the shell commands. Given the relatively
small file I've got here and the fact that the run time is dominated by bzip
anyway, the Perl script is probably good enough as is. If you're *REALLY*
worried about CPU time you might be able to find a Perl module that wrote
compressed files directly---although I don't know if that would be faster than
running an external bzip that presumably was compiled for speed.
I guess it's a matter of taste: very fancy shell piping versus straightforward
Perl. If you speak Perl the answer is obvious; if you don't it's different but
still obvious.
Ted Rodriguez-Bell
Enterprise Virtualization, z/VM and z/Linux, Wells Fargo
(415) 477-6891 office (415) 516-7913 cell
201 3rd St., MAC A0187-050, San Francisco, CA 94103
[email protected] or http://www.vtext.com text paging (but cell is safer)
P.S. The results surprised me, since I'd tried rewriting wc in AWK and Perl a
while ago and the AWK version was much closer to the wc command. I tried it
again and found "perl -ane" instead of "perl -ne" can waste a lot of time.
Don't autosplit unless you need to!
Company policy requires: This message may contain confidential and/or
privileged information. If you are not the addressee or authorized to receive
this for the addressee, you must not use, copy, disclose, or take any action
based on this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail and
delete this message. Thank you for your cooperation.
-----Original Message-----
From: McKown, John [mailto:[email protected]]
Sent: Tuesday, August 14, 2012 9:31 AM
Subject: Is the following "too cute"?
OK, I'm old and used to underpowered hardware. I download some data from z/OS
to process on Linux. I do three things to the data: (1) count how many lines is
in it (wc command); (2) copy selected records into another file (egrep); (3)
bzip2 it. I may be doing something "too cute" to do this (to avoid extra I/O).
What do you think? I process this with the command:
cat irradu00.g1115v00 | \
tee >(wc >|wc.irradu00.g1115v00.txt) >(egrep '^...USER '
>|add-alt-del-user.g1115v00.txt) | \
bzip2 >|irradu00.g1115v00.bz2 && \
rm irradu00.g1115v00
The original file is around 150 gig. It contains about 24 million lines.
Because it is so huge, I know that it cannot entirely reside in the disk cache.
That's why I pipe it into "tee" and use process redirection into wc and grep,
and pipe into bzip2. This avoids reads the file only once.
John McKown
Systems Engineer IV
IT
Administrative Services Group
HealthMarkets(r)
9151 Boulevard 26 * N. Richland Hills * TX 76010
(817) 255-3225 phone *
[email protected] * www.HealthMarkets.com
Confidentiality Notice: This e-mail message may contain confidential or
proprietary information. If you are not the intended recipient, please contact
the sender by reply e-mail and destroy all copies of the original message.
HealthMarkets(r) is the brand name for products underwritten and issued by the
insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance
Company(r), Mid-West National Life Insurance Company of TennesseeSM and The
MEGA Life and Health Insurance Company.SM
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/