So in thinking about the class cleanup we've been wanting to do for a
while; I think the top items on the list (my list at least) are:
- rename the Mail::SpamAssassin::PerMsgStatus class
- break it up into multiple, smaller classes
So here's what I propose for the first one.
Rename the Mail::SpamAssassin::PerMsgStatus class
-------------------------------------------------
Initially the purpose was as a "per-message status" object, describing the
results of a scan of one message -- in other words, that message's spam
status -- "is it spam or not?". I think we all now agree the name is
not so hot ;)
It's purpose has eventually turned out to be:
- (public) methods that actually cause the scan to happen
- (public) the results of a scan operation
- (public) message rewriting functionality
- (internally, plugins) state for a scan operation; plugins and our code
can store state on this object for the duration of the scan
- (plugins) a set of APIs to access aspects of the message being
scanned, and plugin support APIs
- (internally) methods that implement Eval tests
- (internally) methods that control how tests are run,
their ordering etc.
- (internally) methods that implement the DNS event-driven algorithm
- (internally) methods that perform auto-learning
- (internally) methods that compile tests into perl bytecode at runtime
- (internally) methods that parse aspects of the message
- (internally) the tests compiled as perl bytecode
So in my opinion, in cases like this where there's lots of internal and
external APIs, it's more sensible to name the class after what it's
external APIs do. (in fact, most OO design would indicate that this
means you need to refactor out into >1 class. I'm getting to that ;)
So, I think "Mail::SpamAssassin::Scan" is a better name -- the object
returned from M::SpamAssassin::check() is the results of a scan of a
single message.
("Scanner" is another poss, but I think "Scan" is better because we aren't
returning the object that *did* the scan, we're returning the *results* of
the scan.)
The next thing is backwards compatibility. We can only do this if we
don't break third-party code. We *can* rename *this* class without
breaking backwards compatibility, thankfully. Our requirements here are:
- 1. plugins and third-party perl code will very likely contain "use
Mail::SpamAssassin::PerMsgStatus;" lines, so having some kind of
"use"able file there, is a MUST.
- 2. there are possibly locations in third-party code where a
Mail::SpamAssassin::PerMsgStatus object is created other than through
the Mail::SpamAssassin::check() API, so being able to support that is a
SHOULD.
- 3. However, the majority of callers should not be creating PerMsgStatus
objects directly, or depending in any way on the object being of
that specific type. (hooray: perl's not strongly typed! ;)
Here's how I propose to do that:
- rename the current Mail::SpamAssassin::PerMsgStatus class to
Mail::SpamAssassin::Scan
- create a "Facade" Mail::SpamAssassin::PerMsgStatus object that is a
sub-class of ::Scan, with no additional methods or data. in other
words, all method calls and member var accesses will fall through into
::Scan.
- If we deprecate any 3.0.0 APIs in the 3.1.0 cycle, we can move
their "backwards compatibility" methods into that facade class,
because 3.1.0 code will be "Scan-native".
- keep the facade object around for a while, at least until the next
major cycle, because it's super-cheap; we won't even have a "use" line
for it in our code, so it'll take up roughly 200 bytes on disk and
that's it.
Sound useful?
in my opinion this is definitely useful in the 3.1.0 tree.
Break it up into multiple, smaller classes
------------------------------------------
OK, part the second. in my opinion, this is also a very good idea -- as
the XP guys say, PerMsgStatus has a bad "code smell" -- it's a big file
with lots of totally different functionality mushed into one class. In
fact, it even loads methods from *multiple* files, which is totally
nasty. ;)
Here's some more details about what APIs are on the ::PerMsgStatus object
(or the ::Scan object as it may be renamed):
- (public) methods that actually cause the scan to happen
check
This should be left as a public API, but its code moved to a new
class. see "(internally) methods that control how tests are run,
their ordering etc" below.
- (public) the results of a scan operation
is_spam
get_names_of_tests_hit / get_names_of_subtests_hit
get_score / get_hits
get_required_score / get_required_hits
get_autolearn_status / get_report
get_content_preview
finish
These are the main thing that the ::Scan object does, so they stay.
- (public) message rewriting functionality
rewrite_mail
move code into another class; leave this public API on the ::Scan
object which calls into that class.
Proposed class name: Mail::SpamAssassin::Scan::Rewriter?
- (internally, plugins) state for a scan operation; plugins
and our code can store state on this object for the
duration of the scan.
[no methods involved, just the obj itself]
I don't see any need to change this.
- (plugins) a set of APIs to access aspects of the message
being scanned, and plugin support APIs
get_message / get_current_eval_rule_name
get_uri_list / get
create_fulltext_tmpfile
The first two, I think make sense to leave on ::Scan. Last three,
possibly move the logic into another class, but leave the API on
::Scan. In particular, get_uri_list() and get() are both
more appropriate somewhere in the Mail::SpamAssassin::Message
hierarchy.
- (internally) methods that implement Eval tests
[entire contents of EvalTests.pm which
do this horrible hack of putting themselves into
the PerMsgStatus namespace]
move code into another namespace. Eval tests use the
PerMsgStatus object as $self, and since they're just
functions, not objects themselves, that doesn't need to
change -- they'd still get the ::Scan object as their
first arg.
Proposed namespace: Mail::SpamAssassin::Test::Eval?
- (internally) methods that control how tests are run,
their ordering etc.
[parts of check]
[parts of do_head_tests / etc. ]
Definitely move.
Proposed class: Mail::SpamAssassin::TestRunner?
RunTests? Runner? Scanner?
- (internally) methods that implement the DNS event-driven algorithm
[entire contents of Dns.pm which do this horrible hack of putting
themselves into the PerMsgStatus namespace]
into Mail::SpamAssassin::TestRunner as above?
- (internally) methods that perform auto-learning
learn
Proposed class: Mail::SpamAssassin::AutoLearn? (I don't think
mushing into PerMsgLearner, Bayes, or Mail::SpamAssassin makes
sense, so a new class would be better.)
- (internally) methods that compile tests into perl bytecode at runtime
do_head_tests / do_body_tests / do_body_uri_tests
do_rawbody_tests etc.
into Mail::SpamAssassin::TestRunner as above?
- (internally) the tests compiled as perl bytecode
MAKE_MONEY_FAST_body_test etc.
like eval tests, the functions here get the Scan object passed
in as arg 1, and that doesn't need to change.
Proposed namespace: Mail::SpamAssassin::Test::Body, etc., with a
namespace for each type of test? or should we carry on mushing them
into one namespace, Mail::SpamAssassin::Test::Compiled?
--j.