So in thinking about the class cleanup we've been wanting to do for a
while; I think the top items on the list (my list at least) are:

  - rename the Mail::SpamAssassin::PerMsgStatus class

  - break it up into multiple, smaller classes

So here's what I propose for the first one.


Rename the Mail::SpamAssassin::PerMsgStatus class
-------------------------------------------------

Initially the purpose was as a "per-message status" object, describing the
results of a scan of one message -- in other words, that message's spam
status -- "is it spam or not?". I think we all now agree the name is
not so hot ;)

It's purpose has eventually turned out to be:

  - (public) methods that actually cause the scan to happen
  - (public) the results of a scan operation
  - (public) message rewriting functionality

  - (internally, plugins) state for a scan operation; plugins and our code
    can store state on this object for the duration of the scan
  - (plugins) a set of APIs to access aspects of the message being
    scanned, and plugin support APIs

  - (internally) methods that implement Eval tests
  - (internally) methods that control how tests are run,
    their ordering etc.
  - (internally) methods that implement the DNS event-driven algorithm
  - (internally) methods that perform auto-learning
  - (internally) methods that compile tests into perl bytecode at runtime
  - (internally) methods that parse aspects of the message
  - (internally) the tests compiled as perl bytecode


So in my opinion, in cases like this where there's lots of internal and
external APIs, it's more sensible to name the class after what it's
external APIs do.   (in fact, most OO design would indicate that this
means you need to refactor out into >1 class. I'm getting to that ;)

So, I think "Mail::SpamAssassin::Scan" is a better name -- the object
returned from M::SpamAssassin::check() is the results of a scan of a
single message.

("Scanner" is another poss, but I think "Scan" is better because we aren't
returning the object that *did* the scan, we're returning the *results* of
the scan.)


The next thing is backwards compatibility.   We can only do this if we
don't break third-party code.   We *can* rename *this* class without
breaking backwards compatibility, thankfully.  Our requirements here are:

- 1. plugins and third-party perl code will very likely contain "use
  Mail::SpamAssassin::PerMsgStatus;" lines, so having some kind of
  "use"able file there, is a MUST.

- 2. there are possibly locations in third-party code where a
  Mail::SpamAssassin::PerMsgStatus object is created other than through
  the Mail::SpamAssassin::check() API, so being able to support that is a
  SHOULD.   
  
- 3. However, the majority of callers should not be creating PerMsgStatus
  objects directly, or depending in any way on the object being of
  that specific type.  (hooray: perl's not strongly typed! ;)

Here's how I propose to do that:

  - rename the current Mail::SpamAssassin::PerMsgStatus class to
    Mail::SpamAssassin::Scan

  - create a "Facade" Mail::SpamAssassin::PerMsgStatus object that is a
    sub-class of ::Scan, with no additional methods or data.  in other
    words, all method calls and member var accesses will fall through into
    ::Scan.

  - If we deprecate any 3.0.0 APIs in the 3.1.0 cycle, we can move
    their "backwards compatibility" methods into that facade class,
    because 3.1.0 code will be "Scan-native".

  - keep the facade object around for a while, at least until the next
    major cycle, because it's super-cheap; we won't even have a "use" line
    for it in our code, so it'll take up roughly 200 bytes on disk and
    that's it.

Sound useful?

in my opinion this is definitely useful in the 3.1.0 tree.




Break it up into multiple, smaller classes
------------------------------------------

OK, part the second.  in my opinion, this is also a very good idea -- as
the XP guys say, PerMsgStatus has a bad "code smell" -- it's a big file
with lots of totally different functionality mushed into one class.  In
fact, it even loads methods from *multiple* files, which is totally
nasty. ;)

Here's some more details about what APIs are on the ::PerMsgStatus object
(or the ::Scan object as it may be renamed):
 
  - (public) methods that actually cause the scan to happen

        check

    This should be left as a public API, but its code moved to a new
    class.   see "(internally) methods that control how tests are run,
    their ordering etc" below.

  - (public) the results of a scan operation

        is_spam
        get_names_of_tests_hit / get_names_of_subtests_hit
        get_score / get_hits
        get_required_score / get_required_hits
        get_autolearn_status / get_report
        get_content_preview
        finish

    These are the main thing that the ::Scan object does, so they stay.

  - (public) message rewriting functionality

        rewrite_mail

    move code into another class; leave this public API on the ::Scan
    object which calls into that class.
    Proposed class name: Mail::SpamAssassin::Scan::Rewriter?

  - (internally, plugins) state for a scan operation; plugins
    and our code can store state on this object for the
    duration of the scan.

        [no methods involved, just the obj itself]

    I don't see any need to change this.

  - (plugins) a set of APIs to access aspects of the message
    being scanned, and plugin support APIs

        get_message / get_current_eval_rule_name
        get_uri_list / get
        create_fulltext_tmpfile

    The first two, I think make sense to leave on ::Scan. Last three,
    possibly move the logic into another class, but leave the API on
    ::Scan.   In particular, get_uri_list() and get() are both
    more appropriate somewhere in the Mail::SpamAssassin::Message
    hierarchy.

  - (internally) methods that implement Eval tests

        [entire contents of EvalTests.pm which
        do this horrible hack of putting themselves into
        the PerMsgStatus namespace]

    move code into another namespace.  Eval tests use the
    PerMsgStatus object as $self, and since they're just
    functions, not objects themselves, that doesn't need to
    change -- they'd still get the ::Scan object as their
    first arg.

    Proposed namespace: Mail::SpamAssassin::Test::Eval?

  - (internally) methods that control how tests are run,
    their ordering etc.

        [parts of check]
        [parts of do_head_tests / etc. ]

    Definitely move.
    Proposed class: Mail::SpamAssassin::TestRunner?
    RunTests? Runner? Scanner?

  - (internally) methods that implement the DNS event-driven algorithm

        [entire contents of Dns.pm which do this horrible hack of putting
        themselves into the PerMsgStatus namespace]

    into Mail::SpamAssassin::TestRunner as above?

  - (internally) methods that perform auto-learning

        learn

    Proposed class: Mail::SpamAssassin::AutoLearn?   (I don't think
    mushing into PerMsgLearner, Bayes, or Mail::SpamAssassin makes
    sense, so a new class would be better.)

  - (internally) methods that compile tests into perl bytecode at runtime

        do_head_tests / do_body_tests / do_body_uri_tests
        do_rawbody_tests etc.

    into Mail::SpamAssassin::TestRunner as above?

  - (internally) the tests compiled as perl bytecode

        MAKE_MONEY_FAST_body_test etc.

    like eval tests, the functions here get the Scan object passed
    in as arg 1, and that doesn't need to change.
    Proposed namespace: Mail::SpamAssassin::Test::Body, etc., with a
    namespace for each type of test?  or should we carry on mushing them
    into one namespace, Mail::SpamAssassin::Test::Compiled?

--j.

Reply via email to