Cuteness, with the exception of small kittens, is in the eye of the beholder.  
However, you'll want to leave that whole text in your script as a comment 
because it isn't obvious what that pile is or why you did it.  At least it sure 
wasn't obvious to me.

It is, however, efficient.  I tried a couple of alternatives, using 
heavier-duty scripting languages to try to save time by taking only one look at 
   awk '{print | "bzip2 -9 > words.bz2"}
                /z/ {print > "words.has-z"}
                END {print NR}' /usr/share/dict/words
Took about three times as long as the shell script you've got here.  The Perl 
script
  #! /usr/bin/perl

  open(FH, ">words.has-z.pl") || die "Could not open grep file";
  open(BZ, "|bzip2 -9 > words.bz2") || die "Could not open pipe";

  while (<>) {
    print FH $_
      if (/z/);
    print BZ $_;
  }
  print "$. lines read\n";
run with
  ./words.pl /usr/share/dict/words
actually was indistinguishable from the shell commands.  Given the relatively 
small file I've got here and the fact that the run time is dominated by bzip 
anyway, the Perl script is probably good enough as is.  If you're *REALLY* 
worried about CPU time you might be able to find a Perl module that wrote 
compressed files directly---although I don't know if that would be faster than 
running an external bzip that presumably was compiled for speed.

I guess it's a matter of taste:  very fancy shell piping versus straightforward 
Perl.  If you speak Perl the answer is obvious; if you don't it's different but 
still obvious.

Ted Rodriguez-Bell 
Enterprise Virtualization, z/VM and z/Linux, Wells Fargo
(415) 477-6891 office   (415) 516-7913 cell 
201 3rd St., MAC A0187-050, San Francisco, CA 94103 
[email protected] or http://www.vtext.com text paging (but cell is safer) 

P.S.  The results surprised me, since I'd tried rewriting wc in AWK and Perl a 
while ago and the AWK version was much closer to the wc command.  I tried it 
again and found "perl -ane" instead of "perl -ne" can waste a lot of time.  
Don't autosplit unless you need to!

Company policy requires:  This message may contain confidential and/or 
privileged information.  If you are not the addressee or authorized to receive 
this for the addressee, you must not use, copy, disclose, or take any action 
based on this message or any information herein.  If you have received this 
message in error, please advise the sender immediately by reply e-mail and 
delete this message.  Thank you for your cooperation.


-----Original Message-----
From: McKown, John [mailto:[email protected]] 
Sent: Tuesday, August 14, 2012 9:31 AM
Subject: Is the following "too cute"?

OK, I'm old and used to underpowered hardware. I download some data from z/OS 
to process on Linux. I do three things to the data: (1) count how many lines is 
in it (wc command); (2) copy selected records into another file (egrep); (3) 
bzip2 it. I may be doing something "too cute" to do this (to avoid extra I/O). 
What do you think? I process this with the command:

cat irradu00.g1115v00 | \
tee >(wc >|wc.irradu00.g1115v00.txt) >(egrep '^...USER ' 
>|add-alt-del-user.g1115v00.txt) | \
bzip2 >|irradu00.g1115v00.bz2 && \
rm irradu00.g1115v00    

The original file is around 150 gig. It contains about 24 million lines. 
Because it is so huge, I know that it cannot entirely reside in the disk cache. 
That's why I pipe it into "tee" and use process redirection into wc and grep, 
and pipe into bzip2. This avoids reads the file only once.


John McKown
Systems Engineer IV
IT

Administrative Services Group

HealthMarkets(r)

9151 Boulevard 26 * N. Richland Hills * TX 76010
(817) 255-3225 phone *
[email protected] * www.HealthMarkets.com

Confidentiality Notice: This e-mail message may contain confidential or 
proprietary information. If you are not the intended recipient, please contact 
the sender by reply e-mail and destroy all copies of the original message. 
HealthMarkets(r) is the brand name for products underwritten and issued by the 
insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance 
Company(r), Mid-West National Life Insurance Company of TennesseeSM and The 
MEGA Life and Health Insurance Company.SM


----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to