https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6458

           Summary: add blacklist_uri_host, whitelist_uri_host; and A
                    record lookups to URIs in URIDNSBL plugin
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Libraries
        AssignedTo: [email protected]
        ReportedBy: [email protected]


Here are two enhancements which are mostly unrelated (except that
they both deal with URIs found in a message), but need a common
modification to the underlying support code and data structures,
which is why I'm bundling them in a single ticket. The enhancement
was suggested by AXB, and I think it makes a worthwhile addition.

1. added configuration directives blacklist_uri_host,
whitelist_uri_host and unlist_uri_host to manage a list of
black- or white-listed host names or domain name as found in URLs
of a message. This is functionally much like specifying 'uri' rules
with sufficiently precise parsing regular expressions, except that
it is much easier to specify, less error-prone than regexps, and
quicker to execute. Consider it a syntactic sugar for a common
need. It is supposed to deal with user_prefs files ('scoresonly'
switching), although this aspect was not yet thoroughly tested.
It is mostly implemented in Plugin::WLBLEval and Conf.pm.

2. added tflags 'a' and 'ns' to the 'uridnsbl' directive in the
Plugin::URIDNSBL (while preserving compatibility). Traditionally
the uridnsbl rules did a NS lookup on domain names found in URIs,
then mapped these to their IP addresses by an A lookup, which in
turn is sent to DNSBL lists. Not all DNSBL lists are supposed
to be used this way - an example is the "black_a.txt" list at
http://www.uribl.com/datasets.shtml , which expects to be queried
by IP addresses of hosts in URIs, not by their name servers.
With the addition of both tflags, one may choose one or the other
type of a lookup, or even both.

The implementation was complicated by the fact that the underlying
code stripped off host parts to a registrar boundary, which loses
the necessary information for both the uridnsbl+A lookups, as well
as for URI black- and whitelisting. So the change needed to touch
some supporting code, which preserving compativility.


Example use:


if can(Mail::SpamAssassin::Conf::feature_uri_host_wblist)

blacklist_uri_host wWw.Example.COM example.NET 127.0.0.1
blacklist_uri_host 127.0.0.2
whitelist_uri_host aaa.bbb.example.org edu mil

header URI_HOST_IN_BLACKLIST        eval:check_uri_host_in_blacklist()
describe URI_HOST_IN_BLACKLIST      Host or domain found in URI is blacklisted
tflags URI_HOST_IN_BLACKLIST        userconf noautolearn
score URI_HOST_IN_BLACKLIST 0.1

header URI_HOST_IN_WHITELIST        eval:check_uri_host_in_whitelist()
describe URI_HOST_IN_WHITELIST      Host or domain found in URI is blacklisted
tflags URI_HOST_IN_WHITELIST        userwconf noautolearn
score URI_HOST_IN_WHITELIST -0.1

endif


uridnsbl URIBL_TEST  testbl.example.org   TXT
body     URIBL_TEST  eval:check_uridnsbl('URIBL_TEST')
describe URIBL_TEST  Contains a URL listed in the xxx blocklist
tflags   URIBL_TEST  net a




Below are excerpts from the new documentation:



=item uridnsbl NAME_OF_RULE dnsbl_zone lookuptype

Specify a lookup.  C<NAME_OF_RULE> is the name of the rule to be
used, C<dnsbl_zone> is the zone to look up IPs in, and C<lookuptype>
is the type of lookup (B<TXT> or B<A>).   Note that you must also
define a body-eval rule calling C<check_uridnsbl()> to use this.

This works by collecting domain names from URLs and querying DNS
blocklists with an IP address of host names found in URLs or with
IP addresses of their name servers, according to tflags as follows.

If the corresponding body rule has a tflag 'a', the DNS blocklist will
be queried with an IP address of a host found in URLs.

If the corresponding body rule has a tflag 'ns', DNS will be queried
for name servers (NS records) of a domain name found in URLs, then
these name server names will be resolved to their IP addresses, which
in turn will be sent to DNS blocklist.

Tflags directive may specify either 'a' or 'ns' or both flags. In absence
of any of these two flags, a default is a 'ns', which is compatible with
pre-3.4 versions of SpamAssassin.

The choice of tflags must correspond to the policy and expected use of
each DNS blocklist and is normally not a local decision. As an example,
a blocklist expecting queries resulting from an 'a' tflag is a
"black_a.txt" ( http://www.uribl.com/datasets.shtml ).

Example:
 uridnsbl        URIBL_SBLXBL    sbl-xbl.spamhaus.org.   TXT
 body            URIBL_SBLXBL    eval:check_uridnsbl('URIBL_SBLXBL')
 describe        URIBL_SBLXBL    Contains a URL listed in the SBL/XBL blocklist
 tflags          URIBL_SBLXBL    net ns

[...]

=item tflags NAME_OF_RULE ns

The 'ns' flag may be applied to rules corresponding to uridnsbl and uridnssub
directives. Host names from URLs will be mapped to their name server IP
addresses (a NS lookup followed by an A lookup), which in turn will be sent
to blocklists. This is a default when neither 'a' nor 'ns' flags are specified.

=item tflags NAME_OF_RULE a

The 'a' flag may be applied to rules corresponding to uridnsbl and uridnssub
directives. Host names from URLs will be mapped to their IP addresses, which
will be sent to blocklists. When both 'ns' and 'a' flags are specified,
both queries will be performed.



[...]

=item blacklist_uri_host host-or-domain ...

Adds one or more host names to a list of blacklisted URI domains.

No wildcards are supported, but subdomains do match implicitly. There is
only one combined list for black- and whitelisting of host names in URIs.
Search starts by looking up the full hostname first, then leading fields
are progresively stripped off (e.g.: sub.example.com, example.com, com)
until a match is found or we run out of fields. The first matching entry
(the most specific) determines if a lookup yielded a blacklisted or a
whitelisted result.

If an URL contains an IP address in place of a host name, the
black- (or white-) list must specify the exact same IP address.

A domain cannot be both blacklisted and whitelisted at the same time, the
last directive prevails. Use the unlist_uri_host directive to neutralize
previous blacklist_uri_host and whitelist_uri_host settings.


=item whitelist_uri_host host-or-domain ...

Adds one or more host names to a list of whitelisted URI domains.
See blacklist_uri_host directive for details.


=item unlist_uri_host host-or-domain ...

Adds one or more specified host names from a list of black-or-white -listed
URI domains. Removing an unlisted name is ignored (is not an error).

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to