So in thinking about the class cleanup we've been wanting to do for a while; I think the top items on the list (my list at least) are:
- rename the Mail::SpamAssassin::PerMsgStatus class - break it up into multiple, smaller classes So here's what I propose for the first one. Rename the Mail::SpamAssassin::PerMsgStatus class ------------------------------------------------- Initially the purpose was as a "per-message status" object, describing the results of a scan of one message -- in other words, that message's spam status -- "is it spam or not?". I think we all now agree the name is not so hot ;) It's purpose has eventually turned out to be: - (public) methods that actually cause the scan to happen - (public) the results of a scan operation - (public) message rewriting functionality - (internally, plugins) state for a scan operation; plugins and our code can store state on this object for the duration of the scan - (plugins) a set of APIs to access aspects of the message being scanned, and plugin support APIs - (internally) methods that implement Eval tests - (internally) methods that control how tests are run, their ordering etc. - (internally) methods that implement the DNS event-driven algorithm - (internally) methods that perform auto-learning - (internally) methods that compile tests into perl bytecode at runtime - (internally) methods that parse aspects of the message - (internally) the tests compiled as perl bytecode So in my opinion, in cases like this where there's lots of internal and external APIs, it's more sensible to name the class after what it's external APIs do. (in fact, most OO design would indicate that this means you need to refactor out into >1 class. I'm getting to that ;) So, I think "Mail::SpamAssassin::Scan" is a better name -- the object returned from M::SpamAssassin::check() is the results of a scan of a single message. ("Scanner" is another poss, but I think "Scan" is better because we aren't returning the object that *did* the scan, we're returning the *results* of the scan.) The next thing is backwards compatibility. We can only do this if we don't break third-party code. We *can* rename *this* class without breaking backwards compatibility, thankfully. Our requirements here are: - 1. plugins and third-party perl code will very likely contain "use Mail::SpamAssassin::PerMsgStatus;" lines, so having some kind of "use"able file there, is a MUST. - 2. there are possibly locations in third-party code where a Mail::SpamAssassin::PerMsgStatus object is created other than through the Mail::SpamAssassin::check() API, so being able to support that is a SHOULD. - 3. However, the majority of callers should not be creating PerMsgStatus objects directly, or depending in any way on the object being of that specific type. (hooray: perl's not strongly typed! ;) Here's how I propose to do that: - rename the current Mail::SpamAssassin::PerMsgStatus class to Mail::SpamAssassin::Scan - create a "Facade" Mail::SpamAssassin::PerMsgStatus object that is a sub-class of ::Scan, with no additional methods or data. in other words, all method calls and member var accesses will fall through into ::Scan. - If we deprecate any 3.0.0 APIs in the 3.1.0 cycle, we can move their "backwards compatibility" methods into that facade class, because 3.1.0 code will be "Scan-native". - keep the facade object around for a while, at least until the next major cycle, because it's super-cheap; we won't even have a "use" line for it in our code, so it'll take up roughly 200 bytes on disk and that's it. Sound useful? in my opinion this is definitely useful in the 3.1.0 tree. Break it up into multiple, smaller classes ------------------------------------------ OK, part the second. in my opinion, this is also a very good idea -- as the XP guys say, PerMsgStatus has a bad "code smell" -- it's a big file with lots of totally different functionality mushed into one class. In fact, it even loads methods from *multiple* files, which is totally nasty. ;) Here's some more details about what APIs are on the ::PerMsgStatus object (or the ::Scan object as it may be renamed): - (public) methods that actually cause the scan to happen check This should be left as a public API, but its code moved to a new class. see "(internally) methods that control how tests are run, their ordering etc" below. - (public) the results of a scan operation is_spam get_names_of_tests_hit / get_names_of_subtests_hit get_score / get_hits get_required_score / get_required_hits get_autolearn_status / get_report get_content_preview finish These are the main thing that the ::Scan object does, so they stay. - (public) message rewriting functionality rewrite_mail move code into another class; leave this public API on the ::Scan object which calls into that class. Proposed class name: Mail::SpamAssassin::Scan::Rewriter? - (internally, plugins) state for a scan operation; plugins and our code can store state on this object for the duration of the scan. [no methods involved, just the obj itself] I don't see any need to change this. - (plugins) a set of APIs to access aspects of the message being scanned, and plugin support APIs get_message / get_current_eval_rule_name get_uri_list / get create_fulltext_tmpfile The first two, I think make sense to leave on ::Scan. Last three, possibly move the logic into another class, but leave the API on ::Scan. In particular, get_uri_list() and get() are both more appropriate somewhere in the Mail::SpamAssassin::Message hierarchy. - (internally) methods that implement Eval tests [entire contents of EvalTests.pm which do this horrible hack of putting themselves into the PerMsgStatus namespace] move code into another namespace. Eval tests use the PerMsgStatus object as $self, and since they're just functions, not objects themselves, that doesn't need to change -- they'd still get the ::Scan object as their first arg. Proposed namespace: Mail::SpamAssassin::Test::Eval? - (internally) methods that control how tests are run, their ordering etc. [parts of check] [parts of do_head_tests / etc. ] Definitely move. Proposed class: Mail::SpamAssassin::TestRunner? RunTests? Runner? Scanner? - (internally) methods that implement the DNS event-driven algorithm [entire contents of Dns.pm which do this horrible hack of putting themselves into the PerMsgStatus namespace] into Mail::SpamAssassin::TestRunner as above? - (internally) methods that perform auto-learning learn Proposed class: Mail::SpamAssassin::AutoLearn? (I don't think mushing into PerMsgLearner, Bayes, or Mail::SpamAssassin makes sense, so a new class would be better.) - (internally) methods that compile tests into perl bytecode at runtime do_head_tests / do_body_tests / do_body_uri_tests do_rawbody_tests etc. into Mail::SpamAssassin::TestRunner as above? - (internally) the tests compiled as perl bytecode MAKE_MONEY_FAST_body_test etc. like eval tests, the functions here get the Scan object passed in as arg 1, and that doesn't need to change. Proposed namespace: Mail::SpamAssassin::Test::Body, etc., with a namespace for each type of test? or should we carry on mushing them into one namespace, Mail::SpamAssassin::Test::Compiled? --j.