http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5141

           Summary: ArchiveIterator::message_array() etc keep file list in
                    memory
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: major
          Priority: P5
         Component: Libraries
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


For the past 2 weeks I found that my nightly runs would OOM during the scan
phase of mass-check.  After finally doing some debugging, it turns out the
problem is that I've accepted more spam that I used to, so storing all the
records in memory goes over the process' limit.

ie: During the scan phase, ArchiveIterator stores the spam/ham message listing
in two array references $self->{s} and $self->{h}, then merge that together into
a @messages array (so before the two reference vars are undef'ed, that's 2x
memory usage), more processing, then return that for the run phase.

This isn't really an issue most of the time because spamassassin and sa-learn
typically don't process a huge number of messages.  mass-check, however, can
process several hundred thousand (or more) messages at a time, and keeping all
that information in memory can cause OOMs.

So I suggest two things:

- I'm going to commit a patch shortly which at least cuts the memory usage for
"mass-check -n" down a bit so that my nightly runs can actually run.

- We ought to use temp files for the ham/spam arrays, and then process out to a
third temp file.  That way, the memory use will be minimal, and mass-check can
stop doing the "fork a process for scanning" thing, and everything will be 
happier.

Note: this assumes that there's enough temp disk space to store the indexes, but
that's much more likely than having enough RAM IMO.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to