Thanks for the feedback. I was mainly hoping to decrease total run time by 
reducing disk reads of a huge file. I agree the Perl code is good, too. But 
I've never seen the words Perl and straightforward used together that way 
before <grin/>. I use Perl quite a bit myself. But my code is not really 
"straightforward". I use regular expressions way too much to do parsing. A bad 
habit, I admit. 

-- 
John McKown
Systems Engineer IV
IT

Administrative Services Group

HealthMarkets®

9151 Boulevard 26 . N. Richland Hills . TX 76010
(817) 255-3225 phone .
[email protected] . www.HealthMarkets.com

Confidentiality Notice: This e-mail message may contain confidential or 
proprietary information. If you are not the intended recipient, please contact 
the sender by reply e-mail and destroy all copies of the original message. 
HealthMarkets® is the brand name for products underwritten and issued by the 
insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance 
Company®, Mid-West National Life Insurance Company of TennesseeSM and The MEGA 
Life and Health Insurance Company.SM


> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Wednesday, August 15, 2012 6:32 PM
> To: [email protected]
> Cc: McKown, John
> Subject: RE: Is the following "too cute"?
> 
> Cuteness, with the exception of small kittens, is in the eye of the
> beholder.  However, you'll want to leave that whole text in your script
> as a comment because it isn't obvious what that pile is or why you did
> it.  At least it sure wasn't obvious to me.
> 
> It is, however, efficient.  I tried a couple of alternatives, using
> heavier-duty scripting languages to try to save time by taking only one
> look at
>    awk '{print | "bzip2 -9 > words.bz2"}
>                 /z/ {print > "words.has-z"}
>                 END {print NR}' /usr/share/dict/words
> Took about three times as long as the shell script you've got here.
> The Perl script
>   #! /usr/bin/perl
> 
>   open(FH, ">words.has-z.pl") || die "Could not open grep file";
>   open(BZ, "|bzip2 -9 > words.bz2") || die "Could not open pipe";
> 
>   while (<>) {
>     print FH $_
>       if (/z/);
>     print BZ $_;
>   }
>   print "$. lines read\n";
> run with
>   ./words.pl /usr/share/dict/words
> actually was indistinguishable from the shell commands.  Given the
> relatively small file I've got here and the fact that the run time is
> dominated by bzip anyway, the Perl script is probably good enough as
> is.  If you're *REALLY* worried about CPU time you might be able to
> find a Perl module that wrote compressed files directly---although I
> don't know if that would be faster than running an external bzip that
> presumably was compiled for speed.
> 
> I guess it's a matter of taste:  very fancy shell piping versus
> straightforward Perl.  If you speak Perl the answer is obvious; if you
> don't it's different but still obvious.
> 
> Ted Rodriguez-Bell
> Enterprise Virtualization, z/VM and z/Linux, Wells Fargo
> (415) 477-6891 office   (415) 516-7913 cell
> 201 3rd St., MAC A0187-050, San Francisco, CA 94103
> [email protected] or http://www.vtext.com text paging (but cell is
> safer)
> 
> P.S.  The results surprised me, since I'd tried rewriting wc in AWK and
> Perl a while ago and the AWK version was much closer to the wc command.
> I tried it again and found "perl -ane" instead of "perl -ne" can waste
> a lot of time.  Don't autosplit unless you need to!
> 
> Company policy requires:  This message may contain confidential and/or
> privileged information.  If you are not the addressee or authorized to
> receive this for the addressee, you must not use, copy, disclose, or
> take any action based on this message or any information herein.  If
> you have received this message in error, please advise the sender
> immediately by reply e-mail and delete this message.  Thank you for
> your cooperation.
> 
> 
> -----Original Message-----
> From: McKown, John [mailto:[email protected]]
> Sent: Tuesday, August 14, 2012 9:31 AM
> Subject: Is the following "too cute"?
> 
> OK, I'm old and used to underpowered hardware. I download some data
> from z/OS to process on Linux. I do three things to the data: (1) count
> how many lines is in it (wc command); (2) copy selected records into
> another file (egrep); (3) bzip2 it. I may be doing something "too cute"
> to do this (to avoid extra I/O). What do you think? I process this with
> the command:
> 
> cat irradu00.g1115v00 | \
> tee >(wc >|wc.irradu00.g1115v00.txt) >(egrep '^...USER ' >|add-alt-del-
> user.g1115v00.txt) | \
> bzip2 >|irradu00.g1115v00.bz2 && \
> rm irradu00.g1115v00
> 
> The original file is around 150 gig. It contains about 24 million
> lines. Because it is so huge, I know that it cannot entirely reside in
> the disk cache. That's why I pipe it into "tee" and use process
> redirection into wc and grep, and pipe into bzip2. This avoids reads
> the file only once.
> 
> 
> John McKown
> Systems Engineer IV
> IT
> 
> Administrative Services Group
> 
> HealthMarkets(r)
> 
> 9151 Boulevard 26 * N. Richland Hills * TX 76010
> (817) 255-3225 phone *
> [email protected] * www.HealthMarkets.com
> 
> Confidentiality Notice: This e-mail message may contain confidential or
> proprietary information. If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of the
> original message. HealthMarkets(r) is the brand name for products
> underwritten and issued by the insurance subsidiaries of HealthMarkets,
> Inc. -The Chesapeake Life Insurance Company(r), Mid-West National Life
> Insurance Company of TennesseeSM and The MEGA Life and Health Insurance
> Company.SM
> 
> 
> ----------------------------------------------------------------------
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO LINUX-390
> or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> ----------------------------------------------------------------------
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to