http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4603





------- Additional Comments From [EMAIL PROTECTED]  2006-08-22 12:07 -------
(In reply to comment #9)
> I'm curious to hear Radoslaw's comments as to why he thinks this is faster
than running spamd the 
> traditional way. What sorts of bottlenecks did this clear?

I didn't clear any obvious bottlenecks.

Spamd code has evolved for years, patch after patch without a large general
cleanup, so it became rather messy.  Since I had the luxury of having a
reference implementation to work with, I could plan how do I want it organised
with a more general view.

I have moved code around, been very careful about copying data in memory,
avoided few regexps / assignments / control statements... but that gave very
little speed up, if any.  I have focused on correctness and clarity, speed
was supposed to be the side effect.

Spamd could be just a heavy CGI script, if it didn't use its own protocol
(design error IMO; it should have been HTTP).  Now, if you have a CGI script,
you don't write your own code to read data from network and manage children,
but rather leave it to a HTTP server; don't you?  That was where the
performance gain comes from, I believe.

> Did we learn anything here that could more 
> appropriately be applied to spamd instead of relying on apache?

Yes, to reuse code.  Unfortunately, the SA dev team believes in reinventing
the wheel in the name of easy installation. ;-)  There is strong resistance
against creating new dependencies; see bug #4964 for an example.

No, the speed optimisations from [EMAIL PROTECTED], which might be applied to
standalone spamd are very minor.

I suspect that working on the child management code might give good results.
IMO, the right way is to switch to something like Net::Server (and optimise
it if needed).

> In the test results presented in comment #3 we see that the prefork method was
more efficient than the 
> worker thread. Did you try running spamd with --round-robin to see if similar
results are found there?

No, you're welcome to do that. :-)  I doubt it would change much.

> Looking through the RC2 code I'm wondering if you have a short list of things
you think you'd do 
> differently if you felt inclined to rework this.

[EMAIL PROTECTED]: there is plenty of room for improvement, but I believe it's
a solid code base.  I have reworked it back and forth for last two months,
no need to do it again.

spamd: if the SA developers are positive about it, I'll patch it to use
Mail::SpamAssassin::Spamd and Mail::SpamAssassin::Spamd::Config.
I'd kill the custom protocol and switch to HTTP, but that won't be done
(discussed in comment #6).         

SpamAssassin: I'd kill all the performance hacks and introduce strict OO
interface.  I believe that would get a major performance gain in the long
run.  Also, config namespaces would be useful; something like:

my $sa = Mail::SpamAssassin->new;
my $user_config = Mail::SpamAssassin::Config->new(data => $lines);
my $result = $sa->parse->check(add_config => [ $user_config ]);

Then, the it could be checked in this order:
  @{ $self->{add_config} }
  site config
  defaults

> RE: comment #5. For whatever weight my newbie status carries I think this
should be a separate 
> package at this point. For one it requires a host of new dependencies to run.
Also, there's simply no 
> way it's as stable as it will be in a few months.

If it's not integrated, it will never be used and stable.  It can live in
its own directory and be installed only if mod_perl is available.

Yes, it's probably buggy; I have not received a single success or failure
report, so probably has only been tested by me.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to