Re: [Bug 4603] RFE: Apache::SpamD module, to run spamd from httpd

Radoslaw Zielinski Thu, 27 Jul 2006 01:46:41 -0700

> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4603
> ------- Additional Comments From [EMAIL PROTECTED]  2006-07-26 18:17 -------


I dislike the idea of using Bugzilla as a replacement for a mailing list
(bleh, why doesn't ASF use RT); let's move here, if you don't mind...

[...]
> Using IPC::Open3 is a nightmare for portability, btw -- I'm pretty sure it
> doesn't work on win32 at least -- but maybe there are other issues there 
> anyway?

I avoided using shell... well, this can be easily changed.

> how does it compare to current spamd, in speed terms?

174%, crushes the hacky 0.0002s optimizations like cockroaches.

  $ tail -n1 *.log
  ==> prefork.log <==
  parsed 2000 messages in 00:04:32 (272.930377 s),
  7.3279 msgs/s (440 msgs/min, 26380 msgs/h)

  ==> spamd.log <==
  parsed 2000 messages in 00:08:00 (480.140767 s),
  4.1654 msgs/s (250 msgs/min, 14996 msgs/h)

  ==> worker.log <==
  parsed 2000 messages in 00:04:35 (275.170448 s),
  7.2682 msgs/s (436 msgs/min, 26166 msgs/h)

Apache-spamd / spamd run with -x -m 5, Bench-spamd.pl with -c 3 -m 2000.
Hardware: Athlon 1.7xp, 700MB RAM.

> Regarding logging.  What's the issue?  (I couldn't actually spot any logging 
> in
> that tarball.)

Apache redirects stderr to error_log, I don't know how to capture it
(OTOH, I haven't been looking for it, but I don't think it's a good
idea).  The ErrorLog directive doesn't support redirecting to syslog.

So, all the debug messages from SA and some startup errors detected
at the config phase are logged.  This isn't:

  [5273] info: spamd: connection from localhost [127.0.0.1] at port 2347
  [5273] info: spamd: checking message <[EMAIL PROTECTED]> for (unknown):500
  [5273] info: spamd: clean message (0.0/5.0) for (unknown):500 in 0.2 seconds, 
5978 bytes.
  [5273] info: spamd: result: . 0 - 
scantime=0.2,size=5978,user=(unknown),uid=500,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=2347,mid=<[EMAIL
 PROTECTED]>,autolearn=disabled

I have not attained enlightement about the correct way to do it yet.

That would require opening a file to write at some state, passing the
filehandle somehow (global var probably), locking...  If a syslog socket
has been requested, I guess separate connections are needed...  Complex
and error prone.

Adding complexity is easy, keeping it simple and obvious makes a worthy
challenge.

> Should it be integrated into the main distro, or kept as a separate module
> with its own Makefile.PL, do you think?  (I think I'd prefer to integrate, if
> possible.)

If it's not integrated... will be lost, in time.

> And finally, I think it could do with more documentation and tests ;)  a lot 
> of
> that would probably make more sense after the integration-into-distro question
> is resolved (e.g. "what README does it go into"). 

I'd go for separate README.apache to keep things transparent.


Right now, this is written as a PerlProcessConnectionHandler (mod_perl
handler for custom protocols).  I just figured out it *can* be done
using the more popular HTTP handlers (PerlResponseHandler and friends)
and I'm experimenting with it right now.

That would have two benefits I see right now (I doubt it'd change
anything regarding performance).

First one is possibility to use mod_log_config (the CustomLog directive).
If wee agree to compress that four log lines per connection to one, it
would be a clean and efficient way to get the access logging done.

Second one...  Well, here it is; try to keep an open mind. ;-)
I'm reading http://catb.org/esr/writings/taoup/ right now; around the
chapter about protocol design it bugged me: why isn't the spamd protocol
based on HTTP?

Gain: forget the fancy libspamc, forget Mail::SpamAssassin::Client, get
over with parts of spamd network-related code ("sysread not ready"
anyone?), reduce trash code in various spamc implementations (exim,
whatever)...  Just use a HTTP library to do a simple POST (and make sure
the library allows you to read the Spam header after a 2xx response).

So.  If I used the mod_perl HTTP handlers, that would get us very close
to rolling out the SPAMD/2.0 protocol [1].  After some code refactoring,
it'd be possible to use spamd as FastCGI (or regular CGI, if someone
wishes) with any HTTP server.  Authentication?  Just get a mod_auth*
module.  Compression?  mod_deflate.  Whatever?  mod_whatever.

  POST /?method=PROCESS HTTP/1.1

The more I think about it, the more I like the idea.


[1] Actually, it would probably be easier to implement SPAMD/2.0 and add
    the compatibility layer.

-- 
Radosław Zieliński <[EMAIL PROTECTED]>

pgp3TbF2tPBYV.pgp
Description: PGP signature

Re: [Bug 4603] RFE: Apache::SpamD module, to run spamd from httpd

Reply via email to