Re: Plugin for filtering based on local criteria

Kevin A. McGrail Tue, 17 Jun 2014 20:08:32 -0700

On 6/17/2014 10:49 PM, Philip Prindeville wrote:

I’ve contributed fixes to Apache itself since 1997 (though not with any 
regularity), but can’t remember if I’ve ever had to furnish a CLA or not.

Of course. Small fixes don't meet the level of non-trivialness to merita CLA but having a CLA on file is a great first step to getting karma inthe meritocracy that is the ASF. If you had a CLA, your name SHOULD beon this list: http://people.apache.org/committer-index.html#unlistedclas


Sure, opening a bug is fine.

Thanks.


As to your last questions: for someone who doesn’t need the complexity of using 
an DNSBL, doesn’t want the wide scope of using a DNSBL, want to have to 
configure it, or perhaps just wants a significantly more precise tool to solve 
a very limited problem, local blacklisting lets you do this.

This is great. I would put ALL of this in the pm file so the perldocincludes it.


As an example, we were recently hit by a volley of SPAM from a variety of mail 
relays, but they all had something in common.  All of them contained HTML with 
URL’s pointing to websites hosted by “Solar VPS”, and in particular on the 
subnet 65.181.64.0/18 (in some cases, the web hosts had additional A records on 
the subnet 192.99.0.0/16).

It took a couple of hours to get URIDNSBL configured, scored appropriately, and 
working… and verifying that the ill-behaved hosts had corresponding entries in 
multi.uribl.com without prior understanding of the record encoding also took 
some time (since the use of DNS RR’s is an overloading of their intended use, 
it’s less than intuitive).

When it was all over, it occurred to me that a trivial configuration like:

uri_block_cidr L_BLOCK_CIDR     65.181.64.0/18 192.99.0.0/16
body L_BLOCK_CIDR               eval:check_uri_local_bl("L_BLOCK_CIDR")
describe L_BLOCK_CIDR           Block URI's pointing to bad CIDR's
score L_BLOCK_CIDR              5.20

would be a lot more of a pinpoint fix to my issue, rather than the overly 
generalized approach of using multi.uribl.com. And I didn’t want to score 
everyone that was in that DNSBL, just to particular subnets.

After that, it occurred to me that I had never seen a legitimate email with a 
URL pointing to Vietnam or Nigeria in my life, and it would be nice to restrict 
those as well.  So the plugin later evolved to:

uri_block_cc L_BLOCK_CC         cn vn ro bg ru ng eg
body L_BLOCK_CC                 eval:check_uri_local_bl("L_BLOCK_CC")
describe L_BLOCK_CC             Block URI's pointing to countries with no CERT 
or anti-SPAM laws
score L_BLOCK_CC                5.65

In the case of the 65.181.0.0/16 SPAM which provided this call to action, here 
are some subject lines you might recognize:

News alert: you could apply for a CNA education program
Wireless Internet plans online
You've Been Accepted into the Who's Who
Don't overpay for a phone. Try a free* one today
Is your home missing something? How about custom blinds?
Could you study at a CNA education program?
cable service is a possibility

etc. All within a 6 hour spam.

Looking at some recent traffic on the SpamAssassin users mailing list, it 
seemed that other people had had a similar idea at the same time to provide 
surgical blacklisting locally.

At this point, I’m thinking of adding whitelisting support to the country, ISP, 
and CIDR blacklists. For example, we’ve had issues with ServerBeach being 
proactive about Spam or even acknowledging complaints in a timely fashion: that 
said, we get legitimate traffic with URL’s pointing to a Fedora Project 
resource hosted on one of their networks. So we couldn’t blacklist that entire 
ISP without “punching a hole” for Fedora build reports.

The whitelisting would either take individual IP addresses and/or host names as 
they appear in the URL’s.

Hope that answers your questions.


On Jun 17, 2014, at 9:24 AM, Kevin A. McGrail <[email protected]> wrote:

Philip,

Do you have a CLA with the ASF? From checking, I don't believe so.  Can you 
please take a look at http://wiki.apache.org/spamassassin/AboutClas

What might help you is that since this is a plugin, we could open a bug, add it 
to trunk, etc.  for people to more readily test it. it wouldn't be enabled by 
default but should allow more people to readily implement it and provide 
feedback.

However, for me I know I am curious if you could do a bit more description on 
why this is good to implement, what time of spam you use it to block, etc. in 
the pm?

Regards,
KAM

On 6/15/2014 10:47 PM, Philip Prindeville wrote:

Here’s a first attempt at a module.  I based it on Plugin::URIDetail.

It depends on Net::CIDR::Lite and Geo::IP.  If it detects a valid (though not 
necessarily current) ISP database, it will publish a handler for that. Same 
with the IP-Lite (or licensed IP) database from MaxMind.

We’ve been using the MaxMind database for a couple of years on a commercial 
project with good success.

Currently the filtering is done by country code, ISP name, and explicit CIDR 
blocks.

The last test is the least costly, but also the most fine grained… you can 
configure rules to run in whichever order suits your needs best.

I personally sort by country (cn ru bg vn ro ng ir) and then by ISP (won’t name 
them here, but one of them is Over tHere in France), and lastly by CIDR block.

The only real wart on these plugins is that they all index their databases by 
IP address, and do their own (implicit or explicit) name or IP mapping.  
Obviously, this is both blocking and repetitive.

Not sure why PerMsgStatus.pm can’t do the asynchronous name lookups when 
get_uri_detail_list() runs so we have that handy for each of the plugins.  If I 
had the mappings already available, I’d definitely use that.

That is, instead of having:

hosts => {
    ‘nqtel.com’ => ‘nqtel.com’
}

why not instead have:

hosts =>
    ‘nqtel.com’ => [ ‘107.158.259.74’ ]
}

or even both, e.g. [ ‘nqtel.com’, ‘107.158.259.74’ ] (i.e. the domain at index 
0 followed by the list of A records).

One other shortcoming I noticed was the somewhat limited list of error returns 
such as MISSING_REQUIRED_VALUE, INVALID_VALUE, INVALID_HEADER_FIELD_NAME… what 
about MISSING_DEPENDENCY or MISSING_RESOURCE?

What if we want to filter on Geo::IP’s ISP database, but the database isn’t 
present?

I don’t do a lot of volume (maybe 10 messages per second peak), so doing 
blocking lookups isn’t a problem.  But obviously this might be an issue for 
some high volume sites.

Feedback is welcome.

-Philip



--
*Kevin A. McGrail*
President

Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422

http://www.pccc.com/

703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-359-8451 (fax)
[email protected] <mailto:[email protected]>

Re: Plugin for filtering based on local criteria

Reply via email to