https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6735

             Bug #: 6735
           Summary: sa-learn and message max size setting
           Product: Spamassassin
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: spamassassin
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


I run  a mail scanning infrastructure based on MailScanner (
http://mailscanner.info/ ) this bug is however not MailScanner specific.

But MailScanner uses spamassassin directly via its perl module/plugin. So
spamc/spamd is not involved, and here comes the problem.

When you run sa-learn in such an enviroment there is a hardcoded limit to how
big spam/ham messages you can learn tokens from. I'm not sure what the limit is
precisely (somebody on the mailing list said 256kbyte).

If i try to train my bayes with sa-learn on 1MB or 2MB spam mails it doesnt
work, no tokens are learned.

On the mailinglist i was told that i could change the following lines in
sa-learn:

my $iter = new Mail::SpamAssassin::ArchiveIterator(
    {
      'opt_all' => 0,       # skip messages over 250k
      'opt_want_date' => 0,
    }
  );

If you change opt_all to 1 instaid of the default 0, it works. And even big
spam mails can be learned. So by changing that i got a temp fix and i can train
my bayes properly even for the abnormal big spam mails i do get from time to
time.

But as was mentioned on the mailinglist, that fix is very unclear and more or
less impossible to find for a normal user.

So I sugest a parameter or configfile option is added so its configurable in a
simple and straight forward way.

As far as I can tell the case is the same for all 3.X versions, so im marking
version number as unspecified.

I hope the developers agree this should be changed.

Best regards

Jonas Akrouh Larsen

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to