Re: Home made mail news search tool, and folded header lines

Harry Putnam Sat, 27 Mar 2004 07:18:56 -0800

"Charles K. Clarkson" <[EMAIL PROTECTED]> writes:


> : > >       while(<FILE>){
> : > >           chomp;
> : > >           my $line = $_;
> : >
> : > Why here.  Since you are doing this with each line,
> : > you could write in the loop control:
> : > while (my $line = <FILE>) {
> : 
> : Not sure I understand the advantage.  In my
> : formulation, `$line' is minus the trailing newline...
> : which I've found to be nearly always a plus.
>
>     I think Joseph was implying the 'chomp'. This is
> still shorter and IMO clearer than using $_.
>
>     while ( my $line = <FILE> ) {
>          chomp $line;

I hope it doesn't sound like I'm being a hard head... because at my
stage of skill I'm not likely to stand on my practices as better than
some other... but, I'm having trouble seeing what is shorter or
better about this.  Both have 7 entries to type.  And as for clarity,
is it because you say `chomp $line' so it is apparent what is being
chomped? 

> : > >     ## @hdregs is an array of several regex for the
> : > >     ## headers
> : > >     for($ii=0;$ii<=$#hdregs;$ii++){

[...]

> : > Why a C-style for loop?  Are you using the index somewhere?
> : 
> : Well yes, sort of.
>
>     Assuming a non-C maintainer comes along, I would
> recommend the following. The C-style loop is confusing
> to those of us who don't have a background in C. This
> is very clear (to me).
>
>     foreach my $ii ( 0 .. $#hdregs ) {

My usage was called `c - style but in fact I used it because of
familiarity with awk.  Probably awk style was borrowed from C anyway.
But aside from yours being shorter, a formulation like yours leaves me
wondering what its doing.  Probably due to lack of familiarty with
perl I guess.

> : I wanted a way to ensure that each reg has hit at
> : least once.  Otherwise we don't print.  So I used a
> : formulation like this (Not posted previously for
> : clarity):
> : 
> :          if ($data{$hdregs[$ii]}++ == 0) {
> :            ## it will only be 0 once
> :            $hdelem_hit_cnt++; 
> :          }
> : Then before printing we compare $hdelem_hit_cnt to
> : ($#hdregs + 1):
> : 
> :  sub test_hdr_good {
> :     if ($hdelem_hit_cnt == ($#hdregs + 1)) {
> :       $test_hdr_good = "TRUE";
> :       $hdelem_hit_cnt = 0;
>
>     Generally, global variables should raise a giant,
> blinking, annoying sign telling us we an are no
> longer in Kansas.

I didn't post it but in fact I have a `my' declaration like this at
the beginning of my `sub wanted {'

 my($line,@hdhits,$hdelem_hit_cnt,%data); 

Another one at the beginning of the script that trys to catch
everthing that didn't need to be local to a loop of some kind.

> : They should be the same if all regs have hit at least
> : once.  If not the same... we don't print.
>
>     Actually, they should be the same if all regs were
> hit /only/ once.

No, you'd have to try it to see that is not true. 

That was the beauty of it to me.  It would only increment if a
UNIQ hit happened but not if other repeated hits happened. So a
repeat hit would not get mistaken for a UNIQUE hit and throw off the
count. 

>     Depending on where the 'if' block is located, this
> is a roundabout way to test that @hdregs is an array of
> unique values. It would be similar to this outside the
> 'for' loop.

@hdregs is not intended to be an array of uniq values.  There are
cases where I want to print repeated headers like `Received:' lines.

>     But as Randy mentioned, some mail headers are allowed
> to appear more than once. Thus making this test invalid.

Yes, that was what started this thread, my desire to include those in
the hits.. For that reason I choose the `awkward' unique hit technique
so that it would still work to verify that all regex had hit at least
once, but not prevent printing of repeated hits.

I'm sure there are better ways to do that, but I haven't thought of
any yet.  Even after commentary here, I'm not thinking of a slicker
or nicer way to do that.

An important ingrediant in this script is that it return nothing but
a report of no hits if not ALL regex have found a hit.  It is
intended to be restrictive.  For a more inclusive return one lessens
the number or precision of the regex.

This script will have both header and body regex to find in most
cases, although it can be run for just one or the other.  The idea is
to purposely use a restrictive number of regex to have very good
precision over what messages get turned up.
 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Home made mail news search tool, and folded header lines

Reply via email to