Re: format of output file

James G. Sack (jim) Fri, 05 May 2006 00:55:45 -0700

RBW wrote:
>..
> I stuck it right in the loop and it rendered perfectly in the output.
> Are there any efficiency plusses or minuses with this aproach so far
> (elegance vs. brute force)? Like I said it is working fine but I know
> the minute you go past a few lines you should keep your eyes out for
> unnecessarly taxing or sucking up resources...
> 
> I'm fixing up a proxy.pac file for someone BTW...
> 
> Can you point me in the right direction for inserting this list as is in
> the output file into the proxy.pac file (just the clues and hints for
> now so I can track it down myself  ;^). I will just be cutting and
> pasting for now but I am interested in making this list and another get
> inserted at a given place in the proxy.pac file via code.
>


I don't think the processing you are doing is likely to justify
efficiency concerns.

However, if you were doing by-line processing on some really big files,
then you might want to read/process-write a line at a time rather than
slurping the whole file into memory. But then you would have to do the
sorting as a separate step. People don't really sweat the memory for
transient perl scripts these days, until MB-ranges, though.

If performance were a concern, I might want to test separate sort -- pre
or maybe even post, depending on whether the removed spaces affect your
intended sort order. Furthermore an external sort does have a nice -u
(unique) option, if that were a concern. Don't really know whether
/bin/sort is faster than perl, or not, but I suspect it could well be.

Again, for really big datasets, it would be notable that the kind of
line processing you are doing is really an exact fit with sed's design
purposes, and a sed-perl performance test might go to sed for this.
OTOH, it's a lot easier to maintain if you stick to one tool, and perl
is certainly very suitable for this (and a lot more), and whether it
uses more RAM or even runs a bit slower might be of no significance.


A more important question is whether you care about bad input -- dups,
blank lines or garbage lines, say. Is there any bullet-proofing you want
to consider?


RE: elegance questions-

If your code looks like pseudo code, that's probably a very reasonable
approximation to elegance, in my opinion.

Your code looks ok to me, and I don't see any great offense to
perlishness. If the maintainer can understand it, then it's probably
just fine.

The one question I would raise is whether you will ever need to run the
same script on other input files. If so, you could just write it as a
filter from the start. And hard-coding a filename when stdin would work
just as well, might seem like an elegance-flaw to some.

Write
  my @lines = <>;

Then use as:
  ./testAppend.pl <AdBlockList-Hosts-4May06.test >outputTestAppend.txt

..jim









-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Re: format of output file

Reply via email to