What does "unreadable for rule analysis" mean? Surely no one is actually manually reading such lines one rule at a time? Computers can check and grep for you.. ;-)
I think this needs a little bit more of thought what we really want to accomplish here and maybe do it in a bug along with the new templates and stuff if needed.. On Thu, Jun 06, 2019 at 07:48:02AM -0400, Kevin A. McGrail wrote: > That is a frightening one liner. Should we use it? > > As for the more output comment, if you have emails with 300 lower case e's, > you > get 300 hits for the subtext. It is unreadable for rule analysis. > > As for modifying the normal output, I have no idea if anyone out there is > using > the public routine so better to be safe. > > I didn't find a tag for subtests either. That might be a good 4.0 addition. > > Regards, KAM > > On Thu, Jun 6, 2019, 01:30 Henrik K <[1]h...@hege.li> wrote: > > > Well in theory you see _more_ debug output now when there are no > duplicates, > due to the stats string.. honestly atleast I wouldn't care about that. > Feel free to vote. > > As a silly morning exercise, here's a one-liner that compacts stuff :-P > > my $foo = '__A,__B,__C,__C,__C,__CC,__D,__D,__E,__E'; > my $m; $foo =~ s/([^,]+)(?{$m=1})(?:,\1(?=,|$)(?{$m++}))+/"$1($m)"/eg; > > __A,__B,__C(3),__CC,__D(2),__E(2) > > > On Wed, Jun 05, 2019 at 08:25:00PM -0400, Kevin A. McGrail wrote: > > Good point, Henrik & John. > > > > OK, I've left the output alone except for the calls from dbg so it > > shouldn't break anything in the public interface. > > > > Thoughts on this version? > > > > Regards, > > KAM > > > > On 6/4/2019 1:51 PM, John Hardin wrote: > > > On Tue, 4 Jun 2019, Kevin A. McGrail wrote: > > > > > >> Yes, I was thinking about that and wanting to fix uritests so well > > >> for the > > >> template. Thanks for the feedback. I will take another pass at it. > > > > > > Just do the deduplication without modifying the output format. > > > > > > If we want to log the hit counts, then make another function that does > > > what you did and use it for logging. > > > > > > > > >> On Tue, Jun 4, 2019, 03:23 Henrik K <[2]h...@hege.li> wrote: > > >> > > >>> > > >>> If you want to modify debug output, you have to modify only the > dbg() > > >>> output > > >>> itself. You can't modify internal functions that have specific > output > > >>> formats and start adding random strings to them. Atleast these > places > > >>> depend on the comma delimited rules: > > >>> > > >>> ./masses/mass-check: push @tests, split(/,/, > > >>> $status->get_names_of_subtests_hit()); > > >>> ./t/rule_tests.t: my %rules_hit = map { $_ => 1 } > > >>> split(/,/,$msg->get_names_of_tests_hit()), > > >>> split(/,/,$msg->get_names_of_subtests_hit()); > > >>> ./t.rules/run: my $testsline = > > >>> $status->get_names_of_tests_hit().",".$status-> > get_names_of_subtests_hit(); > > >>> > > >>> > > >>> > > >>> > > >>> On Tue, Jun 04, 2019 at 01:56:26AM -0400, Kevin A. McGrail wrote: > > >>>> Morning All, > > >>>> > > >>>> After a few thoughts on limits, it appears that any duplicate > subtest > > >>>> hits are best combined for debug output. > > >>>> > > >>>> Any thoughts on the attached? It looks like it will help me with > rule > > >>>> development while support rules with valid but large maxhits like > > >>> __LOWER_E > > >>>> > > >>>> Regards, > > >>>> KAM > > >>>> > > >>>> On 5/31/2019 10:30 AM, Bill Cole wrote: > > >>>>> On 30 May 2019, at 20:35, Kevin A. McGrail wrote: > > >>>>> > > >>>>>> I was curious if anyone noticed the debug output for subtests has > > >>> gotten > > >>>>>> insane: > > >>>>> > > >>>>> It got a little discussion on users@ when I created those rules. > > >>>>> > > >>>>> [...] > > >>>>> > > >>>>>> [3]72_active.cf: body __LOWER_E /e/ > > >>>>>> [4]72_active.cf: tflags __LOWER_E multiple > > >>>>>> maxhits=230 > > >>>>>> > > >>>>>> [5]72_active.cf: body __E_LIKE_LETTER /<lcase_e>/ > > >>>>>> [6]72_active.cf: tflags __E_LIKE_LETTER multiple > > >>>>>> maxhits=320 > > >>>>>> > > >>>>>> Assuming those maxhits are correct, > > >>>>> > > >>>>> They are. In fact they were carefully tuned to catch the targeted > > >>>>> extortion spam. > > >>>>> > > >>>>>> maybe we need something in the debug > > >>>>>> output that says __E_LIKE_LETTER (number of hits if more than 1). > > >>>>> > > >>>>> That would be a useful enhancement even without my flagrant log > > >>>>> vandalism. > > >>>>> > > >>>> > > >>>> -- > > >>>> Kevin A. McGrail > > >>>> Member, Apache Software Foundation > > >>>> Chair Emeritus Apache SpamAssassin Project > > >>>> [7]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > >>>> > > >>> > > >>>> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm > > >>>> =================================================================== > > >>>> --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582) > > >>>> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) > > >>>> @@ -769,7 +769,38 @@ > > >>>> sub get_names_of_subtests_hit { > > >>>> my ($self) = @_; > > >>>> > > >>>> - return join(',', sort @{$self->{subtest_names_hit}}); > > >>>> + #return join(',', sort @{$self->{subtest_names_hit}}); > > >>>> + > > >>>> + #This routine prints only one instance of a subrule hit with a > > >>>> count > > >>> of how many times it hit if greater than 1 > > >>>> + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, > $rule, > > >>> $total_hits, $deduplicated_hits); > > >>>> + > > >>>> + $total_hits = scalar(@{$self->{subtest_names_hit}}); > > >>>> + > > >>>> + for ($i=0; $i < $total_hits; $i++) { > > >>>> + $rule = ${$self->{subtest_names_hit}}[$i]; > > >>>> + $subtest_names_hit{$rule}++; > > >>>> + } > > >>>> + > > >>>> + foreach $key (keys %subtest_names_hit) { > > >>>> + push (@keys, $key); > > >>>> + } > > >>>> + @sorted = sort @keys; > > >>>> + > > >>>> + $deduplicated_hits = scalar(@sorted); > > >>>> + > > >>>> + for ($i=0; $i < $deduplicated_hits; $i++) { > > >>>> + $string .= $sorted[$i]; > > >>>> + if ($subtest_names_hit{$sorted[$i]} > 1) { > > >>>> + $string .= "($subtest_names_hit{$sorted[$i]})" > > >>>> + } > > >>>> + $string .= ","; > > >>>> + } > > >>>> + > > >>>> + $string =~ s/,$//; > > >>>> + > > >>>> + $string .= " (Total Subtest Hits: $total_hits / Deduplicated > Total > > >>> Hits: $deduplicated_hits)"; > > >>>> + > > >>>> + return $string; > > >>>> } > > >>>> > > >>>> > > >>> > ##################################################################### > ###### > > >>> > > >>> > > >>> > > >> > > > > > > > -- > > Kevin A. McGrail > > Member, Apache Software Foundation > > Chair Emeritus Apache SpamAssassin Project > > [8]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > > > > Index: lib/Mail/SpamAssassin/PerMsgStatus.pm > > =================================================================== > > --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582) > > +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) > > @@ -398,7 +398,7 @@ > > dbg("check: is spam? score=".$self->{score}. > > " required=".$self->{conf}->{required_score}); > > dbg("check: tests=".$self->get_names_of_tests_hit()); > > - dbg("check: subtests=".$self->get_names_of_subtests_hit()); > > + dbg("check: subtests=".$self->get_names_of_subtests_hit("dbg")); > > $self->{is_spam} = $self->is_spam(); > > > > $self->{main}->{resolver}->bgabort(); > > @@ -764,12 +764,52 @@ > > normally-hidden rules, which score 0 and have names beginning with two > > underscores, used in meta rules. > > > > +If a parameter of dbg is passed, the output will be more condensed and > > +sub-tests with multiple hits reduced to one entry with the number of > hits > > +in parentheses. Some information is also added at the end regarding the > > +multiple hits. > > + > > =cut > > > > sub get_names_of_subtests_hit { > > - my ($self) = @_; > > + my ($self, $mode) = @_; > > > > - return join(',', sort @{$self->{subtest_names_hit}}); > > + if (defined $mode && $mode eq 'dbg') { > > + #This routine prints only one instance of a subrule hit with a > count > of how many times it hit if greater than 1 > > + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule, > $total_hits, $deduplicated_hits); > > + > > + $total_hits = scalar(@{$self->{subtest_names_hit}}); > > + > > + for ($i=0; $i < $total_hits; $i++) { > > + $rule = ${$self->{subtest_names_hit}}[$i]; > > + $subtest_names_hit{$rule}++; > > + } > > + > > + foreach $key (keys %subtest_names_hit) { > > + push (@keys, $key); > > + } > > + @sorted = sort @keys; > > + > > + $deduplicated_hits = scalar(@sorted); > > + > > + for ($i=0; $i < $deduplicated_hits; $i++) { > > + $string .= $sorted[$i]; > > + if ($subtest_names_hit{$sorted[$i]} > 1) { > > + $string .= "($subtest_names_hit{$sorted[$i]})" > > + } > > + $string .= ","; > > + } > > + > > + $string =~ s/,$//; > > + > > + $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total > Hits: $deduplicated_hits)"; > > + > > + return $string; > > + > > + } else { > > + #return the simpler string with duplicates and commas > > + return join(',', sort @{$self->{subtest_names_hit}}); > > + } > > } > > > > > ######################################################################## > ### > > > > On Thu, Jun 6, 2019, 01:30 Henrik K <[9]h...@hege.li> wrote: > > > Well in theory you see _more_ debug output now when there are no > duplicates, > due to the stats string.. honestly atleast I wouldn't care about that. > Feel free to vote. > > As a silly morning exercise, here's a one-liner that compacts stuff :-P > > my $foo = '__A,__B,__C,__C,__C,__CC,__D,__D,__E,__E'; > my $m; $foo =~ s/([^,]+)(?{$m=1})(?:,\1(?=,|$)(?{$m++}))+/"$1($m)"/eg; > > __A,__B,__C(3),__CC,__D(2),__E(2) > > > On Wed, Jun 05, 2019 at 08:25:00PM -0400, Kevin A. McGrail wrote: > > Good point, Henrik & John. > > > > OK, I've left the output alone except for the calls from dbg so it > > shouldn't break anything in the public interface. > > > > Thoughts on this version? > > > > Regards, > > KAM > > > > On 6/4/2019 1:51 PM, John Hardin wrote: > > > On Tue, 4 Jun 2019, Kevin A. McGrail wrote: > > > > > >> Yes, I was thinking about that and wanting to fix uritests so well > > >> for the > > >> template. Thanks for the feedback. I will take another pass at it. > > > > > > Just do the deduplication without modifying the output format. > > > > > > If we want to log the hit counts, then make another function that does > > > what you did and use it for logging. > > > > > > > > >> On Tue, Jun 4, 2019, 03:23 Henrik K <[10]h...@hege.li> wrote: > > >> > > >>> > > >>> If you want to modify debug output, you have to modify only the > dbg() > > >>> output > > >>> itself. You can't modify internal functions that have specific > output > > >>> formats and start adding random strings to them. Atleast these > places > > >>> depend on the comma delimited rules: > > >>> > > >>> ./masses/mass-check: push @tests, split(/,/, > > >>> $status->get_names_of_subtests_hit()); > > >>> ./t/rule_tests.t: my %rules_hit = map { $_ => 1 } > > >>> split(/,/,$msg->get_names_of_tests_hit()), > > >>> split(/,/,$msg->get_names_of_subtests_hit()); > > >>> ./t.rules/run: my $testsline = > > >>> $status->get_names_of_tests_hit().",".$status-> > get_names_of_subtests_hit(); > > >>> > > >>> > > >>> > > >>> > > >>> On Tue, Jun 04, 2019 at 01:56:26AM -0400, Kevin A. McGrail wrote: > > >>>> Morning All, > > >>>> > > >>>> After a few thoughts on limits, it appears that any duplicate > subtest > > >>>> hits are best combined for debug output. > > >>>> > > >>>> Any thoughts on the attached? It looks like it will help me with > rule > > >>>> development while support rules with valid but large maxhits like > > >>> __LOWER_E > > >>>> > > >>>> Regards, > > >>>> KAM > > >>>> > > >>>> On 5/31/2019 10:30 AM, Bill Cole wrote: > > >>>>> On 30 May 2019, at 20:35, Kevin A. McGrail wrote: > > >>>>> > > >>>>>> I was curious if anyone noticed the debug output for subtests has > > >>> gotten > > >>>>>> insane: > > >>>>> > > >>>>> It got a little discussion on users@ when I created those rules. > > >>>>> > > >>>>> [...] > > >>>>> > > >>>>>> [11]72_active.cf: body __LOWER_E /e/ > > >>>>>> [12]72_active.cf: tflags __LOWER_E multiple > > >>>>>> maxhits=230 > > >>>>>> > > >>>>>> [13]72_active.cf: body __E_LIKE_LETTER /<lcase_e>/ > > >>>>>> [14]72_active.cf: tflags __E_LIKE_LETTER multiple > > >>>>>> maxhits=320 > > >>>>>> > > >>>>>> Assuming those maxhits are correct, > > >>>>> > > >>>>> They are. In fact they were carefully tuned to catch the targeted > > >>>>> extortion spam. > > >>>>> > > >>>>>> maybe we need something in the debug > > >>>>>> output that says __E_LIKE_LETTER (number of hits if more than 1). > > >>>>> > > >>>>> That would be a useful enhancement even without my flagrant log > > >>>>> vandalism. > > >>>>> > > >>>> > > >>>> -- > > >>>> Kevin A. McGrail > > >>>> Member, Apache Software Foundation > > >>>> Chair Emeritus Apache SpamAssassin Project > > >>>> [15]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > >>>> > > >>> > > >>>> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm > > >>>> =================================================================== > > >>>> --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582) > > >>>> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) > > >>>> @@ -769,7 +769,38 @@ > > >>>> sub get_names_of_subtests_hit { > > >>>> my ($self) = @_; > > >>>> > > >>>> - return join(',', sort @{$self->{subtest_names_hit}}); > > >>>> + #return join(',', sort @{$self->{subtest_names_hit}}); > > >>>> + > > >>>> + #This routine prints only one instance of a subrule hit with a > > >>>> count > > >>> of how many times it hit if greater than 1 > > >>>> + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, > $rule, > > >>> $total_hits, $deduplicated_hits); > > >>>> + > > >>>> + $total_hits = scalar(@{$self->{subtest_names_hit}}); > > >>>> + > > >>>> + for ($i=0; $i < $total_hits; $i++) { > > >>>> + $rule = ${$self->{subtest_names_hit}}[$i]; > > >>>> + $subtest_names_hit{$rule}++; > > >>>> + } > > >>>> + > > >>>> + foreach $key (keys %subtest_names_hit) { > > >>>> + push (@keys, $key); > > >>>> + } > > >>>> + @sorted = sort @keys; > > >>>> + > > >>>> + $deduplicated_hits = scalar(@sorted); > > >>>> + > > >>>> + for ($i=0; $i < $deduplicated_hits; $i++) { > > >>>> + $string .= $sorted[$i]; > > >>>> + if ($subtest_names_hit{$sorted[$i]} > 1) { > > >>>> + $string .= "($subtest_names_hit{$sorted[$i]})" > > >>>> + } > > >>>> + $string .= ","; > > >>>> + } > > >>>> + > > >>>> + $string =~ s/,$//; > > >>>> + > > >>>> + $string .= " (Total Subtest Hits: $total_hits / Deduplicated > Total > > >>> Hits: $deduplicated_hits)"; > > >>>> + > > >>>> + return $string; > > >>>> } > > >>>> > > >>>> > > >>> > ##################################################################### > ###### > > >>> > > >>> > > >>> > > >> > > > > > > > -- > > Kevin A. McGrail > > Member, Apache Software Foundation > > Chair Emeritus Apache SpamAssassin Project > > [16]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > > > > Index: lib/Mail/SpamAssassin/PerMsgStatus.pm > > =================================================================== > > --- lib/Mail/SpamAssassin/PerMsgStatus.pm (revision 1860582) > > +++ lib/Mail/SpamAssassin/PerMsgStatus.pm (working copy) > > @@ -398,7 +398,7 @@ > > dbg("check: is spam? score=".$self->{score}. > > " required=".$self->{conf}->{required_score}); > > dbg("check: tests=".$self->get_names_of_tests_hit()); > > - dbg("check: subtests=".$self->get_names_of_subtests_hit()); > > + dbg("check: subtests=".$self->get_names_of_subtests_hit("dbg")); > > $self->{is_spam} = $self->is_spam(); > > > > $self->{main}->{resolver}->bgabort(); > > @@ -764,12 +764,52 @@ > > normally-hidden rules, which score 0 and have names beginning with two > > underscores, used in meta rules. > > > > +If a parameter of dbg is passed, the output will be more condensed and > > +sub-tests with multiple hits reduced to one entry with the number of > hits > > +in parentheses. Some information is also added at the end regarding the > > +multiple hits. > > + > > =cut > > > > sub get_names_of_subtests_hit { > > - my ($self) = @_; > > + my ($self, $mode) = @_; > > > > - return join(',', sort @{$self->{subtest_names_hit}}); > > + if (defined $mode && $mode eq 'dbg') { > > + #This routine prints only one instance of a subrule hit with a > count > of how many times it hit if greater than 1 > > + my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule, > $total_hits, $deduplicated_hits); > > + > > + $total_hits = scalar(@{$self->{subtest_names_hit}}); > > + > > + for ($i=0; $i < $total_hits; $i++) { > > + $rule = ${$self->{subtest_names_hit}}[$i]; > > + $subtest_names_hit{$rule}++; > > + } > > + > > + foreach $key (keys %subtest_names_hit) { > > + push (@keys, $key); > > + } > > + @sorted = sort @keys; > > + > > + $deduplicated_hits = scalar(@sorted); > > + > > + for ($i=0; $i < $deduplicated_hits; $i++) { > > + $string .= $sorted[$i]; > > + if ($subtest_names_hit{$sorted[$i]} > 1) { > > + $string .= "($subtest_names_hit{$sorted[$i]})" > > + } > > + $string .= ","; > > + } > > + > > + $string =~ s/,$//; > > + > > + $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total > Hits: $deduplicated_hits)"; > > + > > + return $string; > > + > > + } else { > > + #return the simpler string with duplicates and commas > > + return join(',', sort @{$self->{subtest_names_hit}}); > > + } > > } > > > > > ######################################################################## > ### > > > > References: > > [1] mailto:h...@hege.li > [2] mailto:h...@hege.li > [3] http://72_active.cf/ > [4] http://72_active.cf/ > [5] http://72_active.cf/ > [6] http://72_active.cf/ > [7] https://www.linkedin.com/in/kmcgrail > [8] https://www.linkedin.com/in/kmcgrail > [9] mailto:h...@hege.li > [10] mailto:h...@hege.li > [11] http://72_active.cf/ > [12] http://72_active.cf/ > [13] http://72_active.cf/ > [14] http://72_active.cf/ > [15] https://www.linkedin.com/in/kmcgrail > [16] https://www.linkedin.com/in/kmcgrail