https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7342

Robbie Harwood <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #3 from Robbie Harwood <[email protected]> ---
(In reply to RW from comment #1)
> Why do you think it doesn't scale? I'd expect that the spamc overhead as a
> fraction of the resources needed to process a email would scale as roughly
> O(1).

Because during the several days it took to check the corpus, doing anything
over the network was quite laggy (local spamd).  During a typical exchange
between spamc and spamd, more than half of the packets sent are TCP overhead
(how much exactly depends on timing and TCP state machine).

(In reply to Joe Quinn from comment #2)
> Are you unable to just run more instances? That is often the easiest fix for
> unclaimed server resources.

Sure.  I spawned a spamc for all 150k messages.  It used so much RAM that the
oom killer stepped in.

Stepping back from that, now I need to write a control logic to monitor the
number of instances and keep it below a certain threshold, which is another
symptom of the one-message-per-connection/spamc.  More approximate benchmarks
follow:

$ chmod +x spamc_wrapper.sh
$ cat spamc_wrapper.sh
#!/bin/bash
spamc -c < "$1"
$ <list of files onto stdin> | xargs -n 1 -P 5 ./spamc_wrapper.sh

I have a 4-core machine (Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
, so really two with hyperthreading).  For P=5, the CPU load is nonexistent
throughout; it used maybe 300 MB of RAM and took 74 seconds to process 42
messages, which would lead to almost a day for the full corpus.  5 is chosen
here because it is the default max number of children spamd can have.

$ perl -MSocket -e'print SOMAXCONN'
128
$ ps aux | grep /usr/sbin/spamd | grep -v grep
root     19483  8.6  0.6 157480 81316 ?        Ss   20:45   0:02
/usr/sbin/spamd --allow-tell --create-prefs --max-children 128
--helper-home-dir -d --pidfile=/var/run/spamd.pid
$ <list of files onto stdin> | xargs -n 1 -P 128 ./spamc_wrapper.sh

Entire system slowdown; all cores spinning wildly; 6 GB RAM usage above
baseline.  218 messages in 51 seconds for 7.5 days estimate; this is
dramatically worse.

Lowering to 64: RAM usage is now bounded within what my system can support
(it's using just under 2GB).  CPUs spinning wildly.  104 seconds, 533 messages;
almost 9 days.

32: The system is basically tolerable to use here.  Occasionally it will use
all cores, but not always.  Ram usage at about a gig and a half.  84 seconds,
430 messages; almost 9 days.

16: Less than a gig additional RAM.  Very little CPU in use.  53 seconds, 135
messages; 3 days.

8: RAM overhead maybe 500 MB.  No CPU use to speak of.  61 seconds, 97
messages; 2.7 days.

For completeness, it takes 54 seconds to generate the list of messages in the
corpus and count the lines (pipe to wc -l).

If this were bound by the regexes, I would expect to see close to full CPU
utilization with 4 spamc processes, or at the very least with 8.  Instead, we
don't approach that until a much higher process count; 32-64 range.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to