Thanks for the feedback. I was mainly hoping to decrease total run time by reducing disk reads of a huge file. I agree the Perl code is good, too. But I've never seen the words Perl and straightforward used together that way before <grin/>. I use Perl quite a bit myself. But my code is not really "straightforward". I use regular expressions way too much to do parsing. A bad habit, I admit.
-- John McKown Systems Engineer IV IT Administrative Services Group HealthMarkets® 9151 Boulevard 26 . N. Richland Hills . TX 76010 (817) 255-3225 phone . [email protected] . www.HealthMarkets.com Confidentiality Notice: This e-mail message may contain confidential or proprietary information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. HealthMarkets® is the brand name for products underwritten and issued by the insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance Company®, Mid-West National Life Insurance Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: Wednesday, August 15, 2012 6:32 PM > To: [email protected] > Cc: McKown, John > Subject: RE: Is the following "too cute"? > > Cuteness, with the exception of small kittens, is in the eye of the > beholder. However, you'll want to leave that whole text in your script > as a comment because it isn't obvious what that pile is or why you did > it. At least it sure wasn't obvious to me. > > It is, however, efficient. I tried a couple of alternatives, using > heavier-duty scripting languages to try to save time by taking only one > look at > awk '{print | "bzip2 -9 > words.bz2"} > /z/ {print > "words.has-z"} > END {print NR}' /usr/share/dict/words > Took about three times as long as the shell script you've got here. > The Perl script > #! /usr/bin/perl > > open(FH, ">words.has-z.pl") || die "Could not open grep file"; > open(BZ, "|bzip2 -9 > words.bz2") || die "Could not open pipe"; > > while (<>) { > print FH $_ > if (/z/); > print BZ $_; > } > print "$. lines read\n"; > run with > ./words.pl /usr/share/dict/words > actually was indistinguishable from the shell commands. Given the > relatively small file I've got here and the fact that the run time is > dominated by bzip anyway, the Perl script is probably good enough as > is. If you're *REALLY* worried about CPU time you might be able to > find a Perl module that wrote compressed files directly---although I > don't know if that would be faster than running an external bzip that > presumably was compiled for speed. > > I guess it's a matter of taste: very fancy shell piping versus > straightforward Perl. If you speak Perl the answer is obvious; if you > don't it's different but still obvious. > > Ted Rodriguez-Bell > Enterprise Virtualization, z/VM and z/Linux, Wells Fargo > (415) 477-6891 office (415) 516-7913 cell > 201 3rd St., MAC A0187-050, San Francisco, CA 94103 > [email protected] or http://www.vtext.com text paging (but cell is > safer) > > P.S. The results surprised me, since I'd tried rewriting wc in AWK and > Perl a while ago and the AWK version was much closer to the wc command. > I tried it again and found "perl -ane" instead of "perl -ne" can waste > a lot of time. Don't autosplit unless you need to! > > Company policy requires: This message may contain confidential and/or > privileged information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, disclose, or > take any action based on this message or any information herein. If > you have received this message in error, please advise the sender > immediately by reply e-mail and delete this message. Thank you for > your cooperation. > > > -----Original Message----- > From: McKown, John [mailto:[email protected]] > Sent: Tuesday, August 14, 2012 9:31 AM > Subject: Is the following "too cute"? > > OK, I'm old and used to underpowered hardware. I download some data > from z/OS to process on Linux. I do three things to the data: (1) count > how many lines is in it (wc command); (2) copy selected records into > another file (egrep); (3) bzip2 it. I may be doing something "too cute" > to do this (to avoid extra I/O). What do you think? I process this with > the command: > > cat irradu00.g1115v00 | \ > tee >(wc >|wc.irradu00.g1115v00.txt) >(egrep '^...USER ' >|add-alt-del- > user.g1115v00.txt) | \ > bzip2 >|irradu00.g1115v00.bz2 && \ > rm irradu00.g1115v00 > > The original file is around 150 gig. It contains about 24 million > lines. Because it is so huge, I know that it cannot entirely reside in > the disk cache. That's why I pipe it into "tee" and use process > redirection into wc and grep, and pipe into bzip2. This avoids reads > the file only once. > > > John McKown > Systems Engineer IV > IT > > Administrative Services Group > > HealthMarkets(r) > > 9151 Boulevard 26 * N. Richland Hills * TX 76010 > (817) 255-3225 phone * > [email protected] * www.HealthMarkets.com > > Confidentiality Notice: This e-mail message may contain confidential or > proprietary information. If you are not the intended recipient, please > contact the sender by reply e-mail and destroy all copies of the > original message. HealthMarkets(r) is the brand name for products > underwritten and issued by the insurance subsidiaries of HealthMarkets, > Inc. -The Chesapeake Life Insurance Company(r), Mid-West National Life > Insurance Company of TennesseeSM and The MEGA Life and Health Insurance > Company.SM > > > ---------------------------------------------------------------------- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO LINUX-390 > or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > ---------------------------------------------------------------------- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
