https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6735
Bug #: 6735
Summary: sa-learn and message max size setting
Product: Spamassassin
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: spamassassin
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
I run a mail scanning infrastructure based on MailScanner (
http://mailscanner.info/ ) this bug is however not MailScanner specific.
But MailScanner uses spamassassin directly via its perl module/plugin. So
spamc/spamd is not involved, and here comes the problem.
When you run sa-learn in such an enviroment there is a hardcoded limit to how
big spam/ham messages you can learn tokens from. I'm not sure what the limit is
precisely (somebody on the mailing list said 256kbyte).
If i try to train my bayes with sa-learn on 1MB or 2MB spam mails it doesnt
work, no tokens are learned.
On the mailinglist i was told that i could change the following lines in
sa-learn:
my $iter = new Mail::SpamAssassin::ArchiveIterator(
{
'opt_all' => 0, # skip messages over 250k
'opt_want_date' => 0,
}
);
If you change opt_all to 1 instaid of the default 0, it works. And even big
spam mails can be learned. So by changing that i got a temp fix and i can train
my bayes properly even for the abnormal big spam mails i do get from time to
time.
But as was mentioned on the mailinglist, that fix is very unclear and more or
less impossible to find for a normal user.
So I sugest a parameter or configfile option is added so its configurable in a
simple and straight forward way.
As far as I can tell the case is the same for all 3.X versions, so im marking
version number as unspecified.
I hope the developers agree this should be changed.
Best regards
Jonas Akrouh Larsen
--
Configure bugmail:
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.