Re: Google Summer of Code

Justin Mason Mon, 23 Mar 2009 13:27:55 -0700

On Mon, Mar 23, 2009 at 18:47, Mark Martinec <[email protected]> wrote:
>> I think there may still be a meta bug in the bugzilla... worth
>> checking it for ideas.
>
> All I could find was:
>  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4917
> but is empty and closed.


found it.  follow the "depends on" links from
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4560

> Some ideas can be found as enhancement requests in the bugzilla.
>
>
> Here are some other that come to mind:
>
> - 'a bugathlon': there are many bugs open, and some of these are
> rather small things to fix. Some may even be just forgotten and
> already fixed. It would be nice to go systematically through the
> list, doing some triage, and fix the more straightforward ones.
>
> - the M::SA::Message::Metadata::Received::parse_received_line
> looks like one big ad-hoc mess of exceptions. I'd dreamed that
> making a general (but permissive) parser of the syntax as
> prescribed in RFC 2821 could cover 2/3 of the cases, then
> dealing with the remaining exceptions.
>
> - there is a basic IPv6 support in SA, but seems like there are
> several corner cases where IPv6 addresses are not recognized or
> supported. Likely (just guessing) in RBL lookups, in Received header
> field parsing, some DNS lookups in plugins, querying for AAAA in
> addition to A, and in .ip6.arpa for reverse queries, maybe in
> spamc/spamd. It would be nice to go systematically across features,
> checking or fixing their IPv6 support.
>
> - my personal pet peeve: cleanly separating checking of a message
> from score generation and from reporting. This would make it possible
> (when using SA at a MTA level) to run a multi-recipient message
> through checks once, then produce a per-recipient score and/or
> per-recipient report individually for each recipient without having
> to re-run the rules. Most rules are already compatible with this:
> checking could just collect the set of rule names that fire, and
> assigning and summing up scores could be done as a separate step.
> Missing details are excluding rules which have zero score for all
> recipients of a message, short-circuiting, per-recipient bayes.
> Some stats indicate that a message has 1.5 recipients on the average,
> which means saving 50% of time almost for free when running in the
> MTA integration mode, while still preserving many per-recipient features.
>
> - dealing with arbitrary size mail messages: the rules and plugins
> which need it could have access to a complete message kept on a file
> (like checking DKIM signatures, processing of large attached pictures
> or documents, ...), while the rest can continue to work with an
> in-memory copy, but truncated to a managable size if necessary.
> The spamc could for example pass a file name to spamd (when both
> are running on the same host), instead of having to feed mail contents
> through a pipe/socket.
>
>  Mark
>
>

Re: Google Summer of Code

Reply via email to