Good point, Henrik & John.

OK, I've left the output alone except for the calls from dbg so it
shouldn't break anything in the public interface.

Thoughts on this version?

Regards,
KAM

On 6/4/2019 1:51 PM, John Hardin wrote:
> On Tue, 4 Jun 2019, Kevin A. McGrail wrote:
>
>> Yes, I was thinking about that and wanting to fix uritests so well
>> for the
>> template.   Thanks for the feedback.  I will take another pass at it.
>
> Just do the deduplication without modifying the output format.
>
> If we want to log the hit counts, then make another function that does
> what you did and use it for logging.
>
>
>> On Tue, Jun 4, 2019, 03:23 Henrik K <[email protected]> wrote:
>>
>>>
>>> If you want to modify debug output, you have to modify only the dbg()
>>> output
>>> itself.  You can't modify internal functions that have specific output
>>> formats and start adding random strings to them.  Atleast these places
>>> depend on the comma delimited rules:
>>>
>>> ./masses/mass-check:    push @tests, split(/,/,
>>> $status->get_names_of_subtests_hit());
>>> ./t/rule_tests.t:    my %rules_hit = map { $_ => 1 }
>>> split(/,/,$msg->get_names_of_tests_hit()),
>>> split(/,/,$msg->get_names_of_subtests_hit());
>>> ./t.rules/run:  my $testsline =
>>> $status->get_names_of_tests_hit().",".$status->get_names_of_subtests_hit();
>>>
>>>
>>>
>>>
>>> On Tue, Jun 04, 2019 at 01:56:26AM -0400, Kevin A. McGrail wrote:
>>>> Morning All,
>>>>
>>>> After a few thoughts on limits, it appears that any duplicate subtest
>>>> hits are best combined for debug output.
>>>>
>>>> Any thoughts on the attached?  It looks like it will help me with rule
>>>> development while support rules with valid but large maxhits like
>>> __LOWER_E
>>>>
>>>> Regards,
>>>> KAM
>>>>
>>>> On 5/31/2019 10:30 AM, Bill Cole wrote:
>>>>> On 30 May 2019, at 20:35, Kevin A. McGrail wrote:
>>>>>
>>>>>> I was curious if anyone noticed the debug output for subtests has
>>> gotten
>>>>>> insane:
>>>>>
>>>>> It got a little discussion on users@ when I created those rules.
>>>>>
>>>>> [...]
>>>>>
>>>>>> 72_active.cf:    body            __LOWER_E       /e/
>>>>>> 72_active.cf:    tflags          __LOWER_E       multiple
>>>>>> maxhits=230
>>>>>>
>>>>>> 72_active.cf:    body            __E_LIKE_LETTER /<lcase_e>/
>>>>>> 72_active.cf:    tflags          __E_LIKE_LETTER multiple
>>>>>> maxhits=320
>>>>>>
>>>>>> Assuming those maxhits are correct,
>>>>>
>>>>> They are. In fact they were carefully tuned to catch the targeted
>>>>> extortion spam.
>>>>>
>>>>>> maybe we need something in the debug
>>>>>> output that says __E_LIKE_LETTER (number of hits if more than 1).
>>>>>
>>>>> That would be a useful enhancement even without my flagrant log
>>>>> vandalism.
>>>>>
>>>>
>>>> -- 
>>>> Kevin A. McGrail
>>>> Member, Apache Software Foundation
>>>> Chair Emeritus Apache SpamAssassin Project
>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>>
>>>
>>>> Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
>>>> ===================================================================
>>>> --- lib/Mail/SpamAssassin/PerMsgStatus.pm       (revision 1860582)
>>>> +++ lib/Mail/SpamAssassin/PerMsgStatus.pm       (working copy)
>>>> @@ -769,7 +769,38 @@
>>>>  sub get_names_of_subtests_hit {
>>>>    my ($self) = @_;
>>>>
>>>> -  return join(',', sort @{$self->{subtest_names_hit}});
>>>> +  #return join(',', sort @{$self->{subtest_names_hit}});
>>>> +
>>>> +  #This routine prints only one instance of a subrule hit with a
>>>> count
>>> of how many times it hit if greater than 1
>>>> +  my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule,
>>> $total_hits, $deduplicated_hits);
>>>> +
>>>> +  $total_hits = scalar(@{$self->{subtest_names_hit}});
>>>> +
>>>> +  for ($i=0; $i < $total_hits; $i++) {
>>>> +    $rule = ${$self->{subtest_names_hit}}[$i];
>>>> +    $subtest_names_hit{$rule}++;
>>>> +  }
>>>> +
>>>> +  foreach $key (keys %subtest_names_hit) {
>>>> +    push (@keys, $key);
>>>> +  }
>>>> +  @sorted = sort @keys;
>>>> +
>>>> +  $deduplicated_hits = scalar(@sorted);
>>>> +
>>>> +  for ($i=0; $i < $deduplicated_hits; $i++) {
>>>> +    $string .= $sorted[$i];
>>>> +    if ($subtest_names_hit{$sorted[$i]} > 1) {
>>>> +      $string .= "($subtest_names_hit{$sorted[$i]})"
>>>> +    }
>>>> +    $string .= ",";
>>>> +  }
>>>> +
>>>> +  $string =~ s/,$//;
>>>> +
>>>> +  $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total
>>> Hits: $deduplicated_hits)";
>>>> +
>>>> +  return $string;
>>>>  }
>>>>
>>>>
>>> ###########################################################################
>>>
>>>
>>>
>>
>

-- 
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

Index: lib/Mail/SpamAssassin/PerMsgStatus.pm
===================================================================
--- lib/Mail/SpamAssassin/PerMsgStatus.pm       (revision 1860582)
+++ lib/Mail/SpamAssassin/PerMsgStatus.pm       (working copy)
@@ -398,7 +398,7 @@
   dbg("check: is spam? score=".$self->{score}.
                         " required=".$self->{conf}->{required_score});
   dbg("check: tests=".$self->get_names_of_tests_hit());
-  dbg("check: subtests=".$self->get_names_of_subtests_hit());
+  dbg("check: subtests=".$self->get_names_of_subtests_hit("dbg"));
   $self->{is_spam} = $self->is_spam();
 
   $self->{main}->{resolver}->bgabort();
@@ -764,12 +764,52 @@
 normally-hidden rules, which score 0 and have names beginning with two
 underscores, used in meta rules.
 
+If a parameter of dbg is passed, the output will be more condensed and 
+sub-tests with multiple hits reduced to one entry with the number of hits 
+in parentheses. Some information is also added at the end regarding the 
+multiple hits.
+
 =cut
 
 sub get_names_of_subtests_hit {
-  my ($self) = @_;
+  my ($self, $mode) = @_;
 
-  return join(',', sort @{$self->{subtest_names_hit}});
+  if (defined $mode && $mode eq 'dbg') {
+    #This routine prints only one instance of a subrule hit with a count of 
how many times it hit if greater than 1
+    my (%subtest_names_hit, $i, $key, @keys, @sorted, $string, $rule, 
$total_hits, $deduplicated_hits);  
+  
+    $total_hits = scalar(@{$self->{subtest_names_hit}});
+  
+    for ($i=0; $i < $total_hits; $i++) {
+      $rule = ${$self->{subtest_names_hit}}[$i]; 
+      $subtest_names_hit{$rule}++; 
+    }
+  
+    foreach $key (keys %subtest_names_hit) {
+      push (@keys, $key);
+    }
+    @sorted = sort @keys;
+  
+    $deduplicated_hits = scalar(@sorted);
+  
+    for ($i=0; $i < $deduplicated_hits; $i++) {
+      $string .= $sorted[$i];
+      if ($subtest_names_hit{$sorted[$i]} > 1) {
+        $string .= "($subtest_names_hit{$sorted[$i]})"
+      }
+      $string .= ",";
+    }
+  
+    $string =~ s/,$//;
+  
+    $string .= " (Total Subtest Hits: $total_hits / Deduplicated Total Hits: 
$deduplicated_hits)";
+  
+    return $string;
+
+  } else {
+    #return the simpler string with duplicates and commas
+    return join(',', sort @{$self->{subtest_names_hit}});
+  }
 }
 
 ###########################################################################

Reply via email to