http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4603
------- Additional Comments From [EMAIL PROTECTED] 2006-08-22 12:07 -------
(In reply to comment #9)
> I'm curious to hear Radoslaw's comments as to why he thinks this is faster
than running spamd the
> traditional way. What sorts of bottlenecks did this clear?
I didn't clear any obvious bottlenecks.
Spamd code has evolved for years, patch after patch without a large general
cleanup, so it became rather messy. Since I had the luxury of having a
reference implementation to work with, I could plan how do I want it organised
with a more general view.
I have moved code around, been very careful about copying data in memory,
avoided few regexps / assignments / control statements... but that gave very
little speed up, if any. I have focused on correctness and clarity, speed
was supposed to be the side effect.
Spamd could be just a heavy CGI script, if it didn't use its own protocol
(design error IMO; it should have been HTTP). Now, if you have a CGI script,
you don't write your own code to read data from network and manage children,
but rather leave it to a HTTP server; don't you? That was where the
performance gain comes from, I believe.
> Did we learn anything here that could more
> appropriately be applied to spamd instead of relying on apache?
Yes, to reuse code. Unfortunately, the SA dev team believes in reinventing
the wheel in the name of easy installation. ;-) There is strong resistance
against creating new dependencies; see bug #4964 for an example.
No, the speed optimisations from [EMAIL PROTECTED], which might be applied to
standalone spamd are very minor.
I suspect that working on the child management code might give good results.
IMO, the right way is to switch to something like Net::Server (and optimise
it if needed).
> In the test results presented in comment #3 we see that the prefork method was
more efficient than the
> worker thread. Did you try running spamd with --round-robin to see if similar
results are found there?
No, you're welcome to do that. :-) I doubt it would change much.
> Looking through the RC2 code I'm wondering if you have a short list of things
you think you'd do
> differently if you felt inclined to rework this.
[EMAIL PROTECTED]: there is plenty of room for improvement, but I believe it's
a solid code base. I have reworked it back and forth for last two months,
no need to do it again.
spamd: if the SA developers are positive about it, I'll patch it to use
Mail::SpamAssassin::Spamd and Mail::SpamAssassin::Spamd::Config.
I'd kill the custom protocol and switch to HTTP, but that won't be done
(discussed in comment #6).
SpamAssassin: I'd kill all the performance hacks and introduce strict OO
interface. I believe that would get a major performance gain in the long
run. Also, config namespaces would be useful; something like:
my $sa = Mail::SpamAssassin->new;
my $user_config = Mail::SpamAssassin::Config->new(data => $lines);
my $result = $sa->parse->check(add_config => [ $user_config ]);
Then, the it could be checked in this order:
@{ $self->{add_config} }
site config
defaults
> RE: comment #5. For whatever weight my newbie status carries I think this
should be a separate
> package at this point. For one it requires a host of new dependencies to run.
Also, there's simply no
> way it's as stable as it will be in a few months.
If it's not integrated, it will never be used and stable. It can live in
its own directory and be installed only if mod_perl is available.
Yes, it's probably buggy; I have not received a single success or failure
report, so probably has only been tested by me.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.