https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7587

            Bug ID: 7587
           Summary: TxRep: blocks on its own locks, intolerably slow as a
                    result
           Product: Spamassassin
           Version: 3.4 SVN branch
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Plugins
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: Undefined

Created attachment 5572
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5572&action=edit
Do not leak storages or hold them open when not checking

It appears to be hardwired into txrep that it holds the reputation database
open 
indefinitely when spamd is running ($self->{txKeepStoreTied} is set to 1 when
not in learning mode, and with that set, we never call finish() on the storage
module). Thus, any attempt to use TxRep with local storage and a spamd with
more than once instance running at once will lead to huge delays as we block on
the tx-reputation.mutex.

But this is not all. Even with only one spamd running, every check_reputation()
will lead to an attempt to reopen the storage, because open_storages() is
called for every address checked without seeing whether the storage is already
open, and unconditionally opens it again, overwriting the default_storage etc
and leaking it. This leads to the remarkable sight of the thing blocking on a
lock it itself holds.

Patching out the only instance where txKeepStoreTied is set to 1 and arranging
to not leak locks leads to a much more desirable state, where multiple spamds
can run in parallel without mutual blocking on this lock unless they both
attempt to update reputation simultaneously. This is still hardly ideal -- we
should probably hold the storage open if it is a SQL storage, and we have dead
code hanging around relating to the txKeepStoreTied variable -- but at least we
now don't leak them like confetti.

Checking times for me go down from ~186s (due to multiple 30s waits for lock
expiry) to about 4s with this patch applied.

(A >180s check time might be considered a denial-of-service problem, only I'm
not sure it counts as one if the triggering method is as easy as "send an
email, any email". With bugs this big outstanding in its default configuration,
is this plugin actually release-quality?)

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to