http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5511
Summary: Dns.pm: timeout zero while collecting DNS responses
Product: Spamassassin
Version: 3.2.0
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P3
Component: Libraries
AssignedTo: [email protected]
ReportedBy: [EMAIL PROTECTED]
While investigating why certain type of spam sometimes does
not receive its deserved URIDNSBL hits and sometimes also
misses RBL lookup results, I noticed the following telltale
pattern:
[42083] dbg: dns: success for 21 of 22 queries
[42083] dbg: dns: timeout for after 0 seconds
The 'after 0 seconds' looked very suspicious, and digging behind
scenes reveals one bug, one formula calling for improvement,
and one enhancement request (and also sprouted a Bug 5509 report).
To understand the bug the following scenario can be considered,
affixing a typical timestamp (time from the start) with each item:
- 1 s shortly after start of processing, the first round of RBL
and other DNS queries gets launched;
- 1..5 s processing continues with rules, razor, dcc, botnet, etc.,
while DNS results begin to drop in;
- 5 s sub harvest_dnsbl_queries is called, it computes $deadline
and enters its while loop, collecting responses;
Note that $deadline is computed from the time the last DNS
query was sent (at 1 s timestamp).
- 5 s Now here is the catch: the $self->{async}->complete_lookups(1)
within the 'while' expression calls registered 'completed_callback',
which (as in Plugin::URIDNSBL.pm) can call start_lookup() again,
producing new queries, $self->{async}->get_last_start_lookup_time
receives a new timestamp, but, alas, the $deadline is not recomputed!
- 6 s The second round of DNS queries times out instantaneously (when
rbl_timeout is 6 or less), and they never get a chance of succeeding,
even if responses arrive almost immediately.
The solution is to recompute $deadline after every call of
{async}->complete_lookups. I'll attach a patch that does it.
Now to the 'formula calling for improvement' part.
Mail::SpamAssassin::Conf man page describes the dynamic reduction of
rbl_timeout timeout, but promises the timeout never falls below 1:
In addition, whenever the effective timeout is lowered due to addi-
tional query results returning, the remaining queries are always
given at least one more second before timing out, ...
That promise is broken on occasion, for example with this particular
type of spam, 22 dns queries are launched, and if I have rbl_timeout
at 5, the dynamic timeout falls to zero for collecting the last dns
reply (which did arrive, but is never reclaimed). And the last reply
is often the one from URIDNSBL checks, sometimes the only worthwhile
hit that such spam receives.
I think the formula needs to be revised: it should not reduce timeout
to near-zero when remaining queries fall to 1, but perhaps to some
fraction (say: one third) of the original rbl_timeout. And regardless,
the one second minimum must be kept.
I'm now using the following formula, which keeps this promise:
my $dynamic = (int($self->{conf}->{rbl_timeout}
* (1 - 0.7*(($total - @left) / $total) ** 2) + 1)
by dropping time limit to no less than 0.3 of the original value,
offset by 0.5 (with additional 0.5 for rounding). I'll include this
in my match.
Now for the last part - a feature request. I think that no attempt
has been made to collect already received DNS responses when
timeout is reached. Given an asynchronous nature of DNS lookups
in this module, I think it would be worthwhile to collect whatever
is still in the IP receive queue after a timeout.
Thanks for the patience of reading this far :)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.