Re: mod_dnsbl_lookup 0.90

2005-08-16 Thread Colm MacCarthaigh
On Mon, Aug 15, 2005 at 11:11:46PM -0500, Jem Berkes wrote:
 I did start to implement software side caching in mod_dnsbl_lookup but it 
 raised questions as to whether it's appropriate to have global scale 
 caching when we're doing connection and request oriented processing.
 
 So I've left caching out of mod_dnsbl_lookup 0.91

That's fair enough, it's a pain to implement. mod_smtpd is unlikely to
make multiple serialised RBL lookups like exim or sendmail on a
per-request basis anyway, so there may not even be as much benefit.

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: mod_dnsbl_lookup 0.90

2005-08-15 Thread Jem Berkes
 That's super in-efficient for the majority case, and there's no
 application level caching, which tends to be a must for most
 implementations (even if it is only per-request, like Exim's or

We talked about this on IRC, and it seems the preferred approach is to 
delegate the caching responsibility to an entity that is made purely for 
that purpose, for example DJB's local DNS cache software or even rbldnsd 
(an extremely fast DNSBL server) running locally.

I did start to implement software side caching in mod_dnsbl_lookup but it 
raised questions as to whether it's appropriate to have global scale 
caching when we're doing connection and request oriented processing.

So I've left caching out of mod_dnsbl_lookup 0.91




Re: mod_dnsbl_lookup 0.90

2005-08-14 Thread Justin Erenkrantz
On Fri, Jul 29, 2005 at 10:11:46PM +0100, Colm MacCarthaigh wrote:
 Cool. I'd split dnsbl_zones into ipv4_dnsbl_zones and ipv6_dnsbl_zones
 and have the DnsblZones directive work like;
 
   DnsblIPv4Zones 
   DnsblIPv6Zones 

FWIW, I think it'd be fine to have DnsblZones implicitly be DnsblIPv4Zones.
IPv6's directive can explicitly.  But, I think having users always have to
type in IPv4 in the normal case is overkill.  -- justin


Re: mod_dnsbl_lookup 0.90

2005-08-13 Thread Jem Berkes

Cool. I'd split dnsbl_zones into ipv4_dnsbl_zones and ipv6_dnsbl_zones
and have the DnsblZones directive work like;

DnsblIPv4Zones
DnsblIPv6Zones

or similar. IPv6 RBL's do exist, and are incompatible with IPv4 ones, so
it's worth having the support early-on.


I haven't found any examples of IPv6 RBLs. Could you point me to one? What 
I'm finding on the web and usenet is that there is still no established 
standard for IPv6 blocklists. Unless I can find some reference, any 
implementation I make would be a guess so I'm leaving this unsupported 
right now.


Re: mod_dnsbl_lookup 0.90

2005-08-13 Thread Colm MacCarthaigh
On Sat, Aug 13, 2005 at 03:20:10PM -0700, Jem Berkes wrote:
 I haven't found any examples of IPv6 RBLs. 

rbl-plus.hea.net. If you can give me a small fixed IP range, I can
arrange access. 

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: mod_dnsbl_lookup 0.90

2005-08-13 Thread Jem Berkes
Sure, we could support them but if they are the only one (and without 
public documentation on how to use) then aren't we making guesses from a 
rare case? I haven't found any public discussion on IPv6 DNSBL 
conventions.


For example, what is the standard for how to place the IPv6 string under 
the DNSBL zone? Are we still using decimal octets? Can you point me 
towards some examples?


On Sat, 13 Aug 2005, Colm MacCarthaigh wrote:


On Sat, Aug 13, 2005 at 03:20:10PM -0700, Jem Berkes wrote:

I haven't found any examples of IPv6 RBLs.


rbl-plus.hea.net. If you can give me a small fixed IP range, I can
arrange access.

--
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: mod_dnsbl_lookup 0.90

2005-08-13 Thread Colm MacCarthaigh
On Sat, Aug 13, 2005 at 03:54:25PM -0700, Jem Berkes wrote:
 Sure, we could support them but if they are the only one (and without 
 public documentation on how to use) then aren't we making guesses from a 
 rare case? I haven't found any public discussion on IPv6 DNSBL 
 conventions.

Apart from Exim, I don't think there are any client implementationas,
and there's no huge cry for it either. But it is worth bearing in mind,
because one day they will be used by someone, and it's worth bearing in
mind that RBL's are IP-version specific.

 For example, what is the standard for how to place the IPv6 string under 
 the DNSBL zone? Are we still using decimal octets? Can you point me 
 towards some examples?

The standard we use is to use reverse-nibbles.v6.rbl-name. So;

2001:770:18:2::90 in rbl-plus.hea.net. would look like;

0.9.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.0.8.1.0.0.0.7.7.0.1.0.0.2.v6.rbl-plus.hea.net.
 IN  ::1

Using reverse-nibbles has a pretty clear consensus, every IPv6 rbl so
far has used nibbles. The v6 prefix was invented, but it avoids the
permanent collision with *.2.rbl-name which could equally map to
2.0.0.0/8 in IPv4.

Using quad-A's instead of A records is also a bit iffy, there is no
clear consensus on this just yet. I guess we'll have to wait for more
IPv6 spam (I'm only just into double-digits of IPv6 spam senders ever).

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: mod_dnsbl_lookup 0.90

2005-08-13 Thread Jem Berkes
Following up on mod_dnsbl, a new version is nearing completion although I 
have encountered some obstacles that slowed me down. I have taken some of 
Colm's advice to make mod_dnsbl_lookup more flexible and self sufficient. 
I'm attaching the documentation part of what I'm currently working on. If 
anyone sees any logic problems, please let me know!


I have made a major effort to document this thing sufficiently that anyone 
stumbling upon it won't have to struggle wit how the heck to use it. 
Hopefully I will post version 0.91 tomorrow or Monday.


- README follows

A DNSBL or RHSBL is just a form of efficient database that returns a 
simple code (expressed as an IP address) for a given lookup key. The 
lookup key is either an IPv4 or IPv6 address in the case of a DNSBL, or a 
host/sub/domain name in the case of a RHSBL. The return code from the 
database may be an IP address such as 127.0.0.2 or NXDOMAIN, indicating no 
match.


DNSBLs are often used in spam filtering, where the return code 127.0.0.x 
indicates that the lookup key (a relay's IP address) is blacklisted. 
However the meaning of the information returned by a database is on no way 
limited to this. Sometimes the DNSBL server intends positive matches to be 
whitelisted hosts; other times there are a variety of 127.0.0.x codes each 
meaning something different.


For this reason we discourage the use of the term blacklist or RBL (real
time blacklist) because this is just one use of DNSBLs and RHSBLs.

This mod_dnsbl_lookup aims to provide generic and flexible DNSBL and RHSBL 
use without limiting functionality. Each server has its own policy and 
return codess, so you must configure dnsbl_lookup_query appropriately as 
there is no intrinsic way to know if something is blacklisted, 
whitelisted, or somewhere in between.


Define servers and codes that you consider positive matches under one
or more chains. This allows you to make independent configurations for
different uses. Note that only IPv4 is supported at the moment.

# This might be under a mod_smtpd virtual server config
VirtualHost *:25

# Enable module
DnsblLookups On
#
# Need to get host names for RHSBL lookups to work
# Note that terminating dot in server names prevents local domain search
HostNameLookups On
#
# The following define positive matches for the chain I call spammers
#
# Any non-failure result from sbl.spamhaus.org is a positive match
DnsblIPv4 spammers  sbl.spamhaus.org.   any
#
# The 127.0.0.2 result from cbl.abuseat.org is a positive match
DnsblIPv4 spammers  cbl.abuseat.org.127.0.0.2
#
# Only the specific codes 127.0.0.5,6,9 from dnsbl.sorbs.net are positive
# The module internally caches queries, only one actual DNS query is made
DnsblIPv4 spammers  dnsbl.sorbs.net.127.0.0.5
DnsblIPv4 spammers  dnsbl.sorbs.net.127.0.0.6
DnsblIPv4 spammers  dnsbl.sorbs.net.127.0.0.9
#
# The following define positive matches for the chain I call whitelist
#
# A zone designed for whitelisting, any mail from Canada is positive
DnsblIPv4 whitelist ca.countries.nerd.dk.   127.0.0.2
#
# A local zone we run, customers or partners of ours are positive
DnsblIPv4 whitelist customers.dnsbl any
#
# A chain for RHSBL lookups (distinct from DNSBL chains)
#
RhsblZone spammers  rhsbl.ahbl.org. 127.0.0.2

With this configuration, a user could now do a DNSBL_ANYPOSTV_RETFIRST 
query on the spammers chain to see if a host is a spammer (returns 
DNSBL_POSITIVE when the first positive response is encountered). The user 
might also want to do a DNSBL_ANYPOSTV_RETFIRST on the whitelist chain 
and allow through any host that returns DNSBL_POSITIVE, meaning it is 
whitelisted. If the whitelist override is more stringent, a 
DNSBL_ALLPOSTV_RETEVERY query might be done instead to require that every 
single entry in the whitelist chain returns a positive result.


A more lenient admin might instead do a DNSBL_ANYPOSTV_RETEVERY query on 
the spammers chain and do post processing after getting DNSBL_POSITIVE. 
The table returned by the lookup (see below) contains detail on every 
positive match, so the admin may want to only block mail from the host if 
there are at least 2 positive zones. The disadvantage of this are many 
extra queries.


The configuration (above) simplifies the client code down to querying a
specific chain using a certain query mode. The functions used are:

dnsbl_lookup_ip(const char* chain, int querymode, apr_sockaddr_t* address,
apr_pool_t* p, server_rec* s, apr_table_t** zonedata)

dnsbl_lookup_domain(const char* chain, int querymode, const char* domain,
apr_pool_t* p, server_rec* s, apr_table_t** zonedata)

With return values:
DNSBL_POSITIVE - Positive match (zonedata has details, if requested)
DNSBL_NEGATIVE - Negative
DNSBL_FAILURE - Generic failure, e.g. DnsblLookups Off or invalid chain

For DNSBLs, you would use dnsbl_lookup_ip() and pass the IP address in the 
apr_sockaddr_t*. 

Re: mod_dnsbl_lookup 0.90

2005-08-03 Thread Jem Berkes
Sorry for the slow replies, our phone landline +internet is dead and the
telco [TSX: MBT] won't fix it for a week. Terrible for getting work done.

 Cool. I'd split dnsbl_zones into ipv4_dnsbl_zones and ipv6_dnsbl_zones
 and have the DnsblZones directive work like;
 
   DnsblIPv4Zones 
   DnsblIPv6Zones 

That's a good idea, I suspected IPv6 RBLs might exist :) I'll add the IPv6
support.

 dnsbl_lookup_query() takes an IP address argument as a string, but it
 would probably be a lot better to take it as an apr_sockaddr_t, since
 that's an IP version agnostic format, and is generally the way an Apache
 module would have the address available to it.

The problem this introduces is when looking up RHSBLs, which operate on host
names or domain names instead of IP address. Would you recommend different
functions for DNSBL (pass an IP) and RHSBL (pass a hostname or domain name)?

 Passing it around in binary format also helps you avoid using sscanf and
 the associated reentrancy problems on many platforms.

I did not know there were reentrancy problems with sscanf. strtok I know.

 The implementation is neat, but it could also do with efficiency being
 in mind, IME (I help run a very large RBL) rbl lookups tend to be a big
 source of latency during request/mail handling and it's worth making the
 effort to go a bit further :) 

Yes, I am going to add some caching for recent queries. I thought at first
that the resolver already does this but as far as I can tell, it does not do
any caching.

 Although the dnsbl_lookup_query() function's output is comprehensive,
 perhaps more useful and efficient would be to supply a framework for
 allowing modules to check DNSBL's in a boolean manner. As-is the code
 scans every registered RBL, even if one flags an address as listed.
 That's super in-efficient for the majority case, and there's no
 application level caching, which tends to be a must for most
 implementations (even if it is only per-request, like Exim's or
 sendmail's implementations for example).

I agree. What I've started can probably be taken much further but I want to
put the basic layers there first. I'll split up the code so it will be easier
to modify later to not query all at once.

 Part of the lack of boolean-checking reveals another problem, how are
 other modules supposed to know what constitutes a positive for a
 particular RBL?

What constitutes a positive depends entirely on the particular RBL's policy.
Some RBLs are whitelists themselves, so if an IP or domain matches then it
should NOT be blocked.



mod_dnsbl_lookup 0.90

2005-07-29 Thread Jem Berkes
I've posted it here. I've been testing it with 2.1.6-alpha
http://www.sysdesign.ca/archive/mod_dnsbl_lookup-0.90.tar.gz

The README file should describe everything. This is a module providing an 
optional utility function intended for (but not limited to) mod_smtpd. The 
function allows the user to query DNS based blocklist databases, both DNSBL 
and RHSBL style, for arbitrary data. This can be used for all kinds of 
filtering and anti-spam use, including score systems such as spamassassin.

In the case of SMTP the query can be the client's IP, client's host name, 
and the domain used in the sender's address. I know that Rian is currently 
redesigning much of mod_smtpd but for demonstration purposes I have 
included code that brings spam blacklisting functionality to mod_smtpd 0.1. 
If a blacklisted client connects, they will be denied service. It is more 
standard however to have these checks done after RCPT TO, at which time the 
envelope domain can also be checked against RHSBL.

It would be great to hear some feedback as I am very new to writing these 
modules. I've tried to use the proper apr_ calls whenever available.




Re: mod_dnsbl_lookup 0.90

2005-07-29 Thread Colm MacCarthaigh
On Fri, Jul 29, 2005 at 02:23:44PM -0500, Jem Berkes wrote:
 I've posted it here. I've been testing it with 2.1.6-alpha
 http://www.sysdesign.ca/archive/mod_dnsbl_lookup-0.90.tar.gz

Cool. I'd split dnsbl_zones into ipv4_dnsbl_zones and ipv6_dnsbl_zones
and have the DnsblZones directive work like;

DnsblIPv4Zones 
DnsblIPv6Zones 

or similar. IPv6 RBL's do exist, and are incompatible with IPv4 ones, so
it's worth having the support early-on.

dnsbl_lookup_query() takes an IP address argument as a string, but it
would probably be a lot better to take it as an apr_sockaddr_t, since
that's an IP version agnostic format, and is generally the way an Apache
module would have the address available to it.

Passing it around in binary format also helps you avoid using sscanf and
the associated reentrancy problems on many platforms.

The implementation is neat, but it could also do with efficiency being
in mind, IME (I help run a very large RBL) rbl lookups tend to be a big
source of latency during request/mail handling and it's worth making the
effort to go a bit further :) 

Although the dnsbl_lookup_query() function's output is comprehensive,
perhaps more useful and efficient would be to supply a framework for
allowing modules to check DNSBL's in a boolean manner. As-is the code
scans every registered RBL, even if one flags an address as listed.
That's super in-efficient for the majority case, and there's no
application level caching, which tends to be a must for most
implementations (even if it is only per-request, like Exim's or
sendmail's implementations for example).

Part of the lack of boolean-checking reveals another problem, how are
other modules supposed to know what constitutes a positive for a
particular RBL? It hardly makes sense for them to have to supply another
directive to find this out.

This is a non-trivial problem, but if you're going to have an RBL
implementation it needs to be tackled somewhere, though I can't think of
any easy configuration syntax that wouldn't have mod_rewrite-like
complexity. Though maybe something like;

DnsBlackList chainname IPv4 rbl.mail-abuse.org any
DnsBlackList chainname IPv4 rbl-plus.hea.net   127.0.0.7
DnsBlackList chainname IPv6 rbl-6.hea.net  ::1

would work. Then a boolean function, looking something like;

dnsbl_lookup_chain(chainname, apr_sockaddr_t* address)

could return DNSBL_LISTED if the address resolves at all in the
mail-abuse.org list, if it's 127.0.0.7 in the hea.net one, or if it's an
IPv6 address if it's ::1 in the rbl-6 one, you get the idea. Also that
way it can return as soon as it finds any listing for that chain,
without checking the rest.

Multiple chain's mean multiple modules can use it. or that different
chains can be used in different contexts and so on.

Although even this complicated syntax doesn't allow for configurations
like Deny only if listed in at least two RBL's, which are suprisingly
common amongst the paranoid mail administration community. 

And another one is that sometimes RBL's are used a whitelists, for
example some people use rfc-ignorant.org lists as a whitelist so that
they can know not to try callbacks [read
http://www.cus.cam.ac.uk/~fanf2/hermes/doc/talks/2005-02-eximconf/paper.pdf]

Maybe the closest thing the Apache framework to supporting this level of
complexity is SetEnvIf. Building complex allow/deny conditions is hard,
and this all requires a solid think.

But all that said. it's interesting stuff, and I can think of a few ways
in which I'd like these kind of complex configurations, and am certainly
willing to help.

 The README file should describe everything. This is a module providing an 
 optional utility function intended for (but not limited to) mod_smtpd. The 
 function allows the user to query DNS based blocklist databases, both DNSBL 
 and RHSBL style, for arbitrary data. This can be used for all kinds of 
 filtering and anti-spam use, including score systems such as spamassassin.

There's also http://www.blars.org/mod_access_rbl.html, for reference,
but its job is different.

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]