Re: Home made mail news search tool, and folded header lines

Harry Putnam Sat, 27 Mar 2004 02:00:48 -0800

"R. Joseph Newton" <[EMAIL PROTECTED]> writes:

[EMAIL PROTECTED],
Thanks for the advice and code snippet... I'm studying your code now
for adding to my knowledge base.  But understand that base is quite
small at present so it takes me a while to figure out what code is
doing.  Usually involving many trips to perldoc "something".


Going thru your suggestions, I'm not following along in a few places
as drawn out below:

> Harry Putnam wrote:
>
>> I'm writing a home boy mail/news search tool and wondered if there is
>> a cononical way to handle folded or indented header lines.
>>
>> An example would be that I wanted to run a series of regex against each
>> line of input (While in headers) grabing the matches into an array
>> for printing.
>>
>> Something like:
>> [...] snipped getopts and other unrelated stuff
>>       while(<FILE>){
>>           chomp;
>>           my $line = $_;
>
> Why here.  Since you are doing this with each line, you could write in the loop
> control:
> while (my $line = <FILE>) {

Not sure I understand the advantage.  In my formulation, `$line' is
minus the trailing newline... which I've found to be nearly always a plus.

>>           ## @hdregs is an array of several regex for the headers
>>           for($ii=0;$ii<=$#hdregs;$ii++){
>
> Why no space between clauses?  Why no space around assignment
> operators?

Just how I've become accustomed to writing code.  Probably not a good
plan for when others need to read and revise it.

> Why a C-style for loop?  Are you using the index somewhere?

Well yes, sort of.  I wanted a way to ensure that each reg has hit at
least once.  Otherwise we don't print.  So I used a formulation like
this (Not posted previously for clarity):

         if ($data{$hdregs[$ii]}++ == 0) {
           ## it will only be 0 once
           $hdelem_hit_cnt++; 
         }
Then before printing we compare $hdelem_hit_cnt to ($#hdregs + 1):

 sub test_hdr_good {
    if ($hdelem_hit_cnt == ($#hdregs + 1)) {
      $test_hdr_good = "TRUE";
      $hdelem_hit_cnt = 0;
    }
 }

They should be the same if all regs have hit at least once.  If not
the same... we don't print.


>>              if($line =~ /$hdregs[$ii]/){
>
> Right now, you have just gotten quite a bit of information about this line,
> including [with the same amount of effort, the type of header line involved.
>
>>
>>                 ## Capture the line
>>                 push @hits,$line;
>
> You now have thrown away the type information for the line, by throwing it back
> in an usorted bag.  As Joe Ben Stamper said "When you fall, fall in the
> direction of your work".  These lines should probably be going into a hash,
> keyed to the portion of the line before the colon.  You may wish to throw out
> about 3/4 of them, since there are hundreds of different attributes carried in
> header lines, and only a small subset is going to be useful for data
> management.  Under any circumstances, you should probably try to capture *all*
> the information available at this point.

I'm not following you here.  The code does capture the entire line.
And using Randy's concatenation technique, including folded lines
(concat'ed)

Prior to printing the array is sorted like this:
      for(sort @hits){
          print...
      }
So that the ouput has some sort of uniformity.

Further, if I key a hash with stuff before colon, repeated hits like
on `Received' lines  will disappear into the ether.

I plan to use this code for tracking Received lines at times.

[...]

> Then buffer the input.  Declare a variable outside of the loop to hold the
> preivous line.  If the line currently being read begins with whitespace, join it
> to the $current_line with a newline.  It might take a little restructuring of
> the sequence within the loop.  This is one case where a priming read could be of
> assistance, since your loop could then have something in the buffer to spit out
> unless the line being read has space at the start.

Looks like you and Randy hit on the same thing for that situation.
Randy posted a nifty way to do just that.  I just didn't quite follow
the code at first.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Home made mail news search tool, and folded header lines

Reply via email to