There's some room for optimization in what you proposed, but your p-code is close to what I was thinking. in essence, it is the Strategy pattern that calls the inspector logic, combined with a listener for scoring changes (aka Visitor pattern) that interested callers would subscribe to. That works.
After having originally suggested an "InspectionInterruptedException" so that inspectors could abort a running inspection chain, I now think it's better to simply return an object from the listener/visited method, rather than throw an exception. Exceptions could be perceived by an implementer as a little trickier to debug. I'd rather make the API simpler. FYI, the workflow package isn't totally suitable here. When you cited the Decision class, I suspect you actually meant the Outcome class, no? Ok. I've got what I need to do this properly. This should work pretty well. Andrew On Mon, Sep 28, 2009 at 3:11 AM, Janne Jalkanen <[email protected]> wrote: > > Yah, it's just that smaller numbers are usually easier to deal with (numbers > from 1-20). We need more resolution at the low end, not at the high end. > That's my preference for using floats. > > A simple way to do as you suggest would be to have a Score return value, > e.g. > > public Score inspect(...) > { > // Detection score goes here > > if( isSpam ) > return new Score( "mywonderfulspamdetectortest", 0.5f ); > > return Score.OK; // where score.OK is defined to be 0.0f. > } > > where the first String is a key and the float is a default value, which can > be overridden in a config file using the key. This would be pretty close to > the SpamAssassin model (and close to the java.util.Properties conceptually, > so it would be easy to adapt). > > (The reason why I'm proposing a default value is so that when you add a new > Inspector, you don't have to modify two files.) > > After each Inspector would be called, then the parent would call the > callback I proposed, and that would then determine what to do. In > pseudocode: > > // JSPWiki internal code. > private void inspect( DecisionMaker decisionMaker, Change change, <some > other values> ) > { > ArrayList scores; > for( Inspector inspector : m_inspectorChain ) > { > Score s = inspector.inspect( change, <some values> ); > > scores.add( s ); > > if( decisionMaker.visit( scores ) == Decision.SPAM ) { ... } > } > } > > We can then provide something like this > > class SpamFilter implements PageFilter, DecisionMaker > { > ... > > // This emulates the 2.8.x SpamFilter functionality. > public Decision visit( List<Score> scores ) > { > if( scores.last().getValue() > 0.0 ) return Decision.SPAM; > > return Decision.CONTINUE; > } > } > > (At any rate, it might make sense to reuse some classes from the workflow > package here; after all, this is a machine workflow in a sense, is it not?) > > (Another possible pattern would be to have the DecisionMaker throw an > exception. Don't quite know what's better.) > > /Janne > > On Sep 28, 2009, at 02:04 , Andrew Jaquith wrote: > >> It's only more complicated in the sense that I'd have to change some >> recently-written code. :) >> >> Both integer and floating point -- as you point out -- are really the >> same except for the choice of where to put the decimal point. I chose >> integer to preserve the same scoring strategy from the previous >> design. Basically: add stuff up, and if the total exceeds a theshold, >> you've got spam. >> >> I took a quick look at SpamAssassin. They do use floating point >> scores. I like the way they allow weights to be customized for each >> rule. That seems like it would be a good enhancement for us, too. >> >> So, it seems to me that if we go floating-point, we should make it >> possible to configure custom weights for each Inspector. That also >> implies that it would be better to simply have Inspector.inspect() >> simply return a "pass/fail/no result" result and have the parent >> Inspection object figure out what to add or subtract from the running >> total. This would mean that the Inspectors wouldn't be able to modify >> scores directly. And it would be closer to the SpamAssessin model, and >> would probably simpler to code for too. >> >> Reasonable? >> >> On Sun, Sep 27, 2009 at 4:12 PM, Janne Jalkanen >> <[email protected]> wrote: >>> >>> Heya! >>> >>> I don't quite understand - why would a floating point value be any more >>> complicated than an integer? >>> >>> Note that at the moment we mark a modification a spam if a single test >>> fails, which essentially means doing logical OR more than any sort of an >>> addition (=threshold is always 1). The integer is there more or less as >>> >>> something-which-was-never-completed-and-I-am-almost-sure-is-the-wrong-design. >>> In fact, I did lament the fact that you can't say that "X is a bit more >>> efficient than Y" - with an integer-based design you essentially go for >>> fixed point arithmetics (test A gives you 100, test B gives you 150, etc, >>> where you just shifted decimal points to the right just for the heck of >>> it). >>> >>> If you take a look at SpamAssassin, it uses floats. >>> >>> I'd hesitate changing it later, since it will create an API contract. >>> >>> /Janne >>> >>> On Sep 27, 2009, at 22:34 , Andrew Jaquith wrote: >>> >>>> Janne --- good critique, and good suggestions. I've killed the >>>> limit-setting methods on the Inspection class itself in favor of a >>>> listener >>>> strategy. I adopteed your other suggestions too, except... >>>> >>>> ...at the moment I am inclined to leave the scoring scale as >>>> integer-based >>>> for now. I like it because it is simple, and resembles what we do now. >>>> We >>>> can change it later if need be. >>>> >>>> In answer to your first question: the inspector package does not help >>>> graduation, although it does help the beta that comes after the diploma. >>>> :) >>>> >>>> Andrew >>>> >>>> On Sep 27, 2009, at 4:37, Janne Jalkanen <[email protected]> >>>> wrote: >>>> >>>>> >>>>> My main concern: does this advance our graduation? >>>>> >>>>> Others inline. >>>>> >>>>> On 25 Sep 2009, at 20:54, Andrew Jaquith wrote: >>>>> >>>>>> The "inspect" package provides facilities that allow content to be >>>>>> inspected and scored based on various criteria such as whether a wiki >>>>>> page change contains spam. Content may be scored by initializing an >>>>>> Inspection object and then calling the {...@link Inspection#inspect()} >>>>>> method. The {...@code inspect} method, in turn, iterates through a >>>>>> variety of previously-configured {...@link Inspector} objects and calls >>>>>> the {...@link Inspector#inspect(String,String)} method for each one. Each >>>>>> of the configured inspectors is free to perform whatever evaluations >>>>>> it wishes, and can increase, decrease or reset the "scores" for any >>>>>> score category, for example the {...@link Score#SPAM} category. >>>>> >>>>> Ok. >>>>> >>>>>> * Score objects supply instructions to the parent Inspection object to >>>>>> increment, decrement or reset the score for a particular category. >>>>>> Each Score object is constructed with a category (for example, >>>>>> Score.SPAM), an integer indicating how much to change the score, and >>>>>> an optional String message that provides context for the change. For >>>>>> example, a Score that increments the spam score by 1 could be >>>>>> constructed by new Score( Score.SPAM, 1, "Bot detected." ). Negative >>>>>> numbers can be supplied also to decrease the score. For convenience, >>>>>> {...@link Score#INCREMENT_SCORE} means "add 1", {...@link >>>>>> Score#DECREMENT_SCORE} means "subtract 1", and {...@link Score#RESET} >>>>>> means "reset to zero." >>>>> >>>>> Smells of overdesign to me. Creating a new operation and methodology >>>>> to >>>>> support addition sounds like a bad idea. Not to mention the fact that >>>>> you >>>>> might actually want to use fractions. >>>>> >>>>> Something like inspection.changeScore( Score.SPAM, 0.5f ) sounds more >>>>> appealing to me. >>>>> >>>>>> public String preSave( WikiContext context, String content ) throws >>>>>> RedirectException >>>>>> { >>>>>> Change change = getChange( context, content ); >>>>>> >>>>>> // Run the Inspection >>>>>> Inspection inspection = new Inspection( context, m_config, >>>>>> m_inspectors >>>>>> ); >>>>>> inspection.addLimit( Score.SPAM, m_scoreLimit ); >>>>>> inspection.inspect( content, change ); >>>>>> int spamScore = inspection.getScore( Score.SPAM ); >>>>>> context.setVariable( ATTR_SPAMFILTER_SCORE, spamScore ); >>>>>> >>>>>> // Redirect user if score too high >>>>>> if( inspection.isFailed() ) >>>>>> { >>>>>> ... >>>>>> } >>>>>> ... >>>>>> } >>>>> >>>>> Yes, this looks unnecessarily complicated and limiting to me. I would >>>>> remove automatic limit-rating altogether, and would just concentrate on >>>>> refactoring the individual methods from SpamFilter into the >>>>> inspect-package >>>>> with a light wrapper around them. I think a callback design is better >>>>> than >>>>> adding quite limited limit thingies anyway, as this gives full >>>>> programmatic >>>>> control to the developer without us having to pre-guess the possible >>>>> limitations. >>>>> >>>>> E.g. inspection.inspect( content, change, new >>>>> InspectionResultListener() >>>>> { >>>>> public boolean visit(Inspection ins) >>>>> { >>>>> // Any possible decision code goes here >>>>> return ins.getScore() > m_maxScore; >>>>> } >>>>> }); >>>>> >>>>> Obviously, these listeners would be optional, and you could just let >>>>> the >>>>> entire chain run without interference. And for simplicity, we can >>>>> provide >>>>> some default listeners, e.g. >>>>> >>>>> inspection.inspect( content, change, new StopAtScoreListener( >>>>> Score.SPAM, >>>>> 5 ) ); >>>>> >>>>> (Yeah, I'm a big fan of this pattern 'cos it gives much more control to >>>>> the developer and eases debugging immensely.) >>>>> >>>>>> ChangeRateInspector, LinkCountInspector, SpamInspector (for Akismet), >>>>> >>>>> This should obviously be called AkismetInspector; all the others detect >>>>> spammers too (and Akismet is probably the worst of the lot anyway). >>>>> >>>>> /Janne >>> >>> > >
