Re: [Bug 4603] RFE: Apache::SpamD module, to run spamd from httpd

Radoslaw Zielinski Sat, 05 Aug 2006 08:43:59 -0700

Justin Mason <[EMAIL PROTECTED]> [27-07-2006 19:43]:
> Radoslaw Zielinski writes:
>>> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4603
>>> ------- Additional Comments From [EMAIL PROTECTED]  2006-07-26 18:17 -------
>> I dislike the idea of using Bugzilla as a replacement for a mailing list
>> (bleh, why doesn't ASF use RT); let's move here, if you don't mind...
> OK, as long as you find the thread on nabble and post a pointer to the
> bug; it's a *lot* easier to track down a BZ discussion 6 months down the
> line, than find a mailing list thread.


Done.

>> [...]
>>> Using IPC::Open3 is a nightmare for portability, btw -- I'm pretty sure it
>>> doesn't work on win32 at least -- but maybe there are other issues there 
>>> anyway?
>> I avoided using shell... well, this can be easily changed.
> Yep, perl's own 'open "...|"' shell escapes are actually more portable.
> sa-update's code is worth looking at, for an example.

But that's still shell -- source of bugs and nasty suprises...
How about IPC::Run?

>>> how does it compare to current spamd, in speed terms?
>> 174%, crushes the hacky 0.0002s optimizations like cockroaches.
> ha! I suspect these numbers are without any ruleset, though ;) Also, worth
> noting that spamd does some time-consuming tasks that apache-spamd doesn't
> (like log via syslog).

Only default rules indeed.  I've implemented logging; recent results
(spamd >file 2>&1; no syslog):

    Apache, prefork
  parsed 2000 messages in 00:04:28 (268.642257 s),
  7.4448 msgs/s (447 msgs/min, 26801 msgs/h)

    spamd
  parsed 2000 messages in 00:07:02 (422.312356 s),
  4.7358 msgs/s (284 msgs/min, 17049 msgs/h)

[...]
> so prefork.log and worker.log are both using apache-spamd, with
> those MPMs?  That's a pretty excellent speedup.

Yes, spamd.log is from standalone stock brand shiny spamd.

[...]
>> That would require opening a file to write at some state, passing the
>> filehandle somehow (global var probably), locking...  If a syslog socket
>> has been requested, I guess separate connections are needed...  Complex
>> and error prone.

> for what it's worth, I'd say:

>   - forget about syslog; apache has its own logging model which doesn't
>     involve that, so we don't have to either ;)

Forgotten.

>   - open ">>" filehandles have atomic writes for inter-process contention,
[...]

I have separated four functions:

  $ grep sub\ log Spamd.pm 
  sub log_connection {
  sub log_start_work {
  sub log_end_work {
  sub log_result {

They're creating the string to be logged (just like spamd) and calling
info().  I'd change all but log_result() to dbg(), but that's your call.

So, all of it ends up in the error_log, along with the startup stuff.
Can be changed easily just by tweaking these subroutines.

>>> And finally, I think it could do with more documentation and tests ;)  a 
>>> lot of
>>> that would probably make more sense after the integration-into-distro 
>>> question
>>> is resolved (e.g. "what README does it go into"). 
>> I'd go for separate README.apache to keep things transparent.
> Sure; like the spamc model.  But there has to be other integration into
> documentation, the top-level README, INSTALL, etc. at least.

Sure.

>> Right now, this is written as a PerlProcessConnectionHandler (mod_perl
>> handler for custom protocols).  I just figured out it *can* be done
>> using the more popular HTTP handlers (PerlResponseHandler and friends)
>> and I'm experimenting with it right now.
[...]

Update: I have wasted days for pushing mp2 to its limits... and I sort
of bounced off these limits; details on the mp users list (in short:
connection filters happen after the core filter which reads headers,
TransHandler can't really change what Apache thinks about the protocol
used for the connection).  The tricky part happens to be the compat
layer; it can be done in a clumsy way with performance hit for 1.x
clients.

[...]
>>   POST /?method=PROCESS HTTP/1.1
>> The more I think about it, the more I like the idea.
> Wow.  That's scary. ;) I'll have to think about that one.

> I'm not sure I see *sufficient* benefit, in terms of the other parts of
> the code, though.  The two protocols are both very, very simple; I think
> there'd be more code needing to be written to support HTTP (with a new
> URL-based, CGI-style parameter-passing scheme), than the existing lines of
> code for supporting SPAMD!

There is plenty of efficient, tested code implementing HTTP clients.
neon, libcurl, libghttp, w3c-libwww -- that's what I have installed.
Just do a library call, no need to write anything fancy.  CGI-style
parameter-passing would be handy for future extensibility; for now,
/[?&;]method=([A-Z]+)/ would do for the parsing.

What do we have for SPAMD/1.3?  Just the undocumented libspamc and
M::SA::Client (alpha version, for robustness seek elsewhere).



I thought mod_perl is substantially more powerful.  Considering the
technical difficulties and the fact that two protocols in use is worse
than having only one, rolling this out would only have a point if v2.0
was strongly pushed, marking v1.x as obsolete and soon-to-be unsupported.

Since there's no enthusiasm, I doubt you would do that.  Therefore,
I'll stay with the PerlProcessConnectionHandler, abstracting things
when I see opportunity.

-- 
Radosław Zieliński <[EMAIL PROTECTED]>

pgpTmahtmHHVB.pgp
Description: PGP signature

Re: [Bug 4603] RFE: Apache::SpamD module, to run spamd from httpd

Reply via email to