Re: [spamdyke-users] feature requests :)

2008-04-12 Thread Andras Korn
On Tue, Apr 08, 2008 at 11:40:48PM -0500, Sam Clippinger wrote:

 Andras Korn wrote:
  On Tue, Apr 08, 2008 at 12:38:38AM -0500, Sam Clippinger wrote:
  blacklist connections at the TCP level. tcpserver (or its replacement)
  should only set appropriate environment variables based on the tests it
  has carried out and leave blacklisting to the next program in the chain,
  which speaks the application layer protocol concerned (SMTP in this case).
  
  This way, spamdyke needn't duplicate the work of tcpserver and can still
  obtain envelope information before refusing the mail.
 
 I think we're just going to have to agree to disagree on this.  I 

OK. :)

 way.)  The fact that environment variables are not easily visible to 
 external viewers is a show-stopper for me.

I still think though that as long as their contents can be logged by the
child process that is using them, this is a non-issue.

  That's certainly one possible conclusion, but if you look at it this way,
  you could just as well implement your own SSL support or C library. IMHO it
  is perfectly acceptable for some features to only be available on some
  systems.
  
  I think the Unix Way is to re-use as much of what already exists as possible
  in order to minimize duplication of effort and to better focus development.
  A single tool should do a well-defined set of related things, and do that
  exceptionally well, while providing the ability to combine effectively with
  other tools. I wouldn't want ls(1) to have the features of find(1) just to
  be self-sufficient.
 
 implement in a reasonable timeframe.)  As for reimplementing the C 
 library, I have no problem with writing new code when the existing 
 library doesn't meet my needs.

Sure; the point I was trying to make is that it's often not a good idea to
write your own code just for the sake of doing so if there is existing code
that would do the job.

Some reasons:

- you may introduce bugs that are not present in the existing implementation
  (because you're working with something you're not yet experienced in, this
  is arguably pretty likely);

- fewer eyeballs will see your code than that of the existing standard
  implementation, which means that security problems will probably be
  noticed later;

- you don't automatically benefit from the continued development of the
  standard implementation.

 (Interestingly, DJB reimplemented portions of the C library when he wrote
 qmail and its supporting tools. For example, most of the memory allocation
 routines and string functions used by qmail are DJB's code, not the system
 library's.)

I know, and part of me hates DJB for that. It makes his code a nightmare to
modify, and modifications very hard to get right.

At least he did it for good reasons, but with blatant disregard to
everyone else. It's one of the reasons why qmail and most other djbware
isn't more open.

 Your description of The Unix Way could just as easily be described as 
 The Reusable Way or The Library Way or even The Free Software Way.  As 

True.

 well with it.  As I get requests for changes to work with a 
 project/patch I've never heard of, I try to determine if the changes 
 will benefit a large enough audience to be worthwhile.  So far, you are 
 the first person to request that spamdyke support environment variables 
 the way rblsmtpd does.  If other people also request it, I'll reconsider 
 my position.

I understand. If I can find the time, I may write a patch that does what I
want.

  I don't think of myself as a programmer and I teach Unix system
  administration at a university. I may just have been lucky so far, but most
  of my students know about strace and are able to use /proc even before they
  enrol for my course.
  
  I think you're being unfair to system administrators here, or are including
  untrained computer operators in the term.
  
  As I see it, the important thing however is that specialist software
  shouldn't be designed to meet the needs of laymen; it should be built to
  best support the trained expert while being as useful as possible to those
  with less than expert knowledge but a willingness to learn.
 [snip]
  I wouldn't put anyone to whom daemon processes are mysterious in charge of a
  critical service.
 
 I certainly mean no disrespect to you, your students or any other system 
 administrator.

I know, I was just amazed that you think of sysadmins as, let us say, less
than well trained for their jobs.

 However, on the mailing lists and forums I read, I see many questions from
 administrators who are obviously not familiar with the tools you've
 mentioned.

This is a pretty sad state of affairs, but I don't think it's reason enough
to develop system software that meets the needs of laymen at the expense of
experts. Spamdyke is not an end-user tool, after all. If someone doesn't
have the skills necessary to use it, they should acquire them, and it
shouldn't be the software that is made less efficient or dumber to
accommodate 

Re: [spamdyke-users] feature requests :)

2008-04-08 Thread Andras Korn
On Tue, Apr 08, 2008 at 12:38:38AM -0500, Sam Clippinger wrote:

 That makes sense, when it is explained in that way.  However, I still 
 don't find it intuitive, because it requires an explanation.  The flag 
 I've implemented in the next version of spamdyke will look like this:
   filter-level=normal
   filter-level=allow-all
   filter-level=require-auth
   filter-level=reject-all
 If any of those lines are present in a configuration file, I believe no 
 explanation is required to understand their basic effect.  The same is 
 not true of an environment variable; that's why I don't like it.

Well, if FILTER_LEVEL were an environment variable with the above values
supported, I see no fundamental difference in the intuitiveness.

  However, if other tools already set this variable, I can make spamdyke 
  use it to allow better compatibility.  Since I don't (and won't) use 
  this feature myself, whether I implement it is up to everyone here.  If 
  people want it, I'll add it.
  
  Would you integrate a patch with this functionality?
 
 Only if other code already exists that sets this environment variable 
 (e.g. a tcpserver replacement).

In fact, tcpserver can do it itself, based on tcp.cdb. But so can ipsvd.

  I've implemented a flag in the next version of spamdyke that will 
  function the way you describe the WHITELIST variable.  It doesn't use 
  an environment variable but it is otherwise identical.  It also has a 
  
  The idea with the environment variable is that it can be set/unset using an
  arbitrarily flexible or complex mechanism outside spamdyke, based on
  arbitrary criteria. I don't see how you can duplicate that in any other way.
 
 If the parent daemon (e.g. tcpserver) can alter the environment for its 
 children based on arbitrary criteria, why can't it alter spamdyke's 
 command line instead?

You know that tcpserver can't. ipsvd sort of can, but that solution doesn't
scale well (it would boil down to spawning a script that starts spamdyke
with a different command line for each connection).

Support for environment variables exists and is scalable.

  I'm getting the impression you're describing software that hasn't been
 written yet anyway, so the environment doesn't have to be the only way to
 communicate with child processes.

No, I'm totally writing about existing software here.

  Much of what you're doing in spamdyke is duplicating functionality that
  could be (and is) provided by a tcpserver replacement. For example,
  blacklisting IP addresses and rdns domains could be trivially accomplished
  using tcpsvd and environment variables; no need for these kinds of
  blacklists in spamdyke.
[...]
 All very true.  tcpserver does indeed provide the TCPREMOTEHOST 
 environment variable, which spamdyke ignores.  tcpserver also parses 
 /etc/tcp.smtp.cdb but spamdyke ignores its efforts and reparses 
 /etc/tcp.smtp anyway.

Note that this isn't necessarily the same. tcp.smtp.cdb is updated
atomically; the same is not true for tcp.smtp. I don't think re-using the
source of a generated binary file in this way is a clean solution, but I
won't argue this point because I don't use tcp.smtp at all anyway (because
ipsvd also supports a different configuration scheme, see
ipsvd-instruct(5)).

 There are several reasons I'm implementing these features in spamdyke 
 and duplicating the effort put into tcpserver (and others).  Efficiency 
 is not always my top priority.

I don't think this is about efficiency; it's more about clean separation of
duties.

 First, there are some situations where spamdyke must perform duplicate 
 work in order to achieve the correct result.  SMTP AUTH is the best 
 example -- authenticated users are allowed to bypass all filters.  If 
 blacklisting takes place before spamdyke is invoked, authenticated users 
 will be incorrectly blacklisted.  This is one of rblsmtpd's major failings.

I completely agree. As I wrote in my previous message, it is undesirable to
blacklist connections at the TCP level. tcpserver (or its replacement)
should only set appropriate environment variables based on the tests it
has carried out and leave blacklisting to the next program in the chain,
which speaks the application layer protocol concerned (SMTP in this case).

This way, spamdyke needn't duplicate the work of tcpserver and can still
obtain envelope information before refusing the mail.

 Second, most qmail servers use DJB's tcpserver.  Many replacements may 
 be available but none are in wide use.  For that reason, I must design 
 spamdyke for the lowest common denominator of qmail configurations. 
 If I make spamdyke dependent on an alternative daemon, spamdyke's 
 popularity will immediately drop to (almost) zero.

I don't think I recommended or requested any change that would make spamdyke
dependent on any alternative to tcpserver; at least it wasn't my intention.

 try it quickly, see if it works and remove it just as quickly.  I am a 
 qmail expert yet I still 

Re: [spamdyke-users] feature requests :)

2008-04-06 Thread Sam Clippinger
Wow; thanks for the suggestions!  I'll respond to each one inline below...

KORN Andras wrote:
 Hi,
 
 I've just tried spamdyke and like it so far.
 
 I have a few ideas for new features and some comments.
 
 * I think spamdyke would make an even more seamless replacement for rblsmtpd
 if it supported the RBLSMTPD environment variable in roughly the same way as
 rblsmtpd itself; that is, if it's set but empty, skip RBL checks; if it's
 set to a string, reject the mail temporarily with the given string as the
 error message sent to the client; and if the string begins with a hyphen,
 reject the message permanently with the string sans the hyphen as the error
 message sent to the client. If the variable is unset, just filter normally.

I wasn't aware rblsmtpd included this feature.  I'm a little hesitant to 
duplicate it because of its design -- the existence of the environment 
variable and its effect on rblsmtpd's behavior are very non-intuitive. 
In particular, using the first character to signal a temporary/permanent 
rejection code is too obscure for my taste.

However, if other tools already set this variable, I can make spamdyke 
use it to allow better compatibility.  Since I don't (and won't) use 
this feature myself, whether I implement it is up to everyone here.  If 
people want it, I'll add it.

 It would be similarly desirable to specify custom error messages in
 blacklist files.

I've already added custom rejection messages to the next version of 
spamdyke.  The rejection message for each filter can be overridden in 
the configuration file (or on the command line).

 * Other environment variables could be supported in a similar way; e.g. if
 WHITELIST is set, skip all spam tests and allow the mail through; or
 perhaps even selectively enable/disable some tests based on envvars (which
 can be set by tcpsvd or a replacement).

I've implemented a flag in the next version of spamdyke that will 
function the way you describe the WHITELIST variable.  It doesn't use 
an environment variable but it is otherwise identical.  It also has a 
setting to block all messages (like an analogous BLACKLIST variable).

 I think this is more in line with The Qmail Way than using your own
 list-files and may also save resources (because you don't have to sift
 through half a dozen lists, just consult a handful of environment variables
 that were set by your parent process).

I understand your suggestion although I must admit I don't hold The 
Qmail Way in very high regard.  Too much of qmail is obscure, 
undocumented and/or only configurable by applying patches.  That's just 
my opinion though.

I don't like passing values to child processes in environment variables 
because they're not externally visible.  In other words, when an 
environment variable is set, only the child process can read it.  If the 
child process doesn't behave correctly, it's difficult/impossible to 
figure out why (or to reproduce the conditions for troubleshooting).  On 
the other hand, when the configuration is set through command line flags 
or configuration files, it's very easy to see what's happening. 
Configuration, testing and troubleshooting are much easier.

So, as with the RBLSMTPD environment variable, I would be willing to 
implement environment variable-based configuration in spamdyke in order 
to work with existing tools.  But I don't think implementing new 
features this way is a good idea.

When it comes to changing spamdyke's configuration for each connection, 
I think the next version of spamdyke will do what you want.  I've 
implemented a system where the configuration can be changed based on the 
incoming IP address, the incoming rDNS name, the sender address or the 
recipient address (or any combination of those four attributes).

 * As for filtering invalid recipients, I think the approach implemented by
 the SPAMCONTROL patch for qmail is feasible, at least for smallish
 installations: list all valid recipient addresses in a file (with wildcards
 supported), and block everything else.
[snip]
 recipient-whitelist-file isn't the same, because if a line is matched, all
 spam tests are skipped. With badrcptto, this is just an additional test:
 does the recipient exist?

Several other people have suggested storing a list of valid usernames in 
a file and I don't like that idea for several reasons.  First, it 
doesn't work for large sites.  spamdyke is being used on mail servers 
that host tens of thousands of domains.  The files would be too big, too 
difficult to maintain and too slow to search.  Second, how do you create 
and maintain the list of valid addresses?  Doing it by hand is not 
practical.  If there is a way to do it (correctly) from a script, please 
send me that script -- it contains all the logic I need to implement 
real recipient validation in spamdyke.  Third, if I add recipient 
validation by checking lists in files, I must continue to support it in 
the future, even if I later add real validation.  I'd