Janne, I picked the really nice option. :) The solution is that when a post contains spam, we redirect to the editor page, but request a CAPTCHA be displayed. Re-editing is allowed.
Here is how it works. There are two collaborating parts: the SpamProtectTag and the SpamInterceptor. This is where we do a little magic. :) Let's say you've loaded the editor for the first time (i.e., you haven't submitted). What we do is write out a special parameter, a "challenge request," when SpamProtectTag executes. The contents, for the FIRST GET, contain the string value of the enum Challenge.Request.CHALLENGE_ON_DEMAND. This means "no CAPTCHA is required, but when we interpret the post, get ready to generate one after redirect if there's spam in it." Then, we encrypt the parameter using CryptoUtil. When SpamInterceptor intercepts the POST, we then look for the special challenge-request parameter. Two things can happen: a normal user submits (in which case the challenge-request parameter will be there), or s spammer submits (in which case it will not be). In the normal case, we extract the challenge-request parameter, decrypt the contents and figure out that its value was CHALLENGE_ON_DEMAND. Because it has this value, we do NOT run the Captcha validator. We always run the content Inspection. If it contains spam, we add a ValidationError. If not, we return a null Resolution, the "save" event method executes further down the chain, and we are done. Now, let's look at the spammer case. If the challenge-request parameter is not present in the request, we KNOW that the user has been naughty, or that it is a spammer. So we add a ValidationError and redirect to the editor again. On the second GET (i.e., after the POST and redirect back to the editor page), the SpamProtectTag executes again. This time, it knows there was spam because of the ValidationError, and this time will write out the enum Challenge.Request.CAPTCHA, which means "I just rendered a CAPTCHA, and when SpamInterceptor intercepts the post, validate it." Thus, when SpamInterceptor handles the post next time around, when it sees the CAPTCHA value it knows that it should do the CAPTCHA check. (and then we lather, rinse, repeat until the user submits a correct CAPTCHA value) That might sound complicated, but it's not -- the code is dead simple. The key is that the SpamProtectTag writes the current state out to the challenge-request parameter: CAPTCHA_ON_DEMAND is written out for the first-time GET, and on subsequent GETs, CAPTCHA will be written out if the contents are spam. All SpamInterceptor needs to do is obtain what the state was by retrieving and decrypting the challenge-request param. There is one other wrinkle here, which is if we see the SpamProtectTag attribute "challenge" in the JSP, when the JSP author wants to force a password check or a CAPTCHA in all cases. In that case, we will write out the value Challenge.Request.CAPTCHA or Challenge.Request.PASSWORD and render the Challenge right away, even on that first post. Naming-wise, I've gone back and forth about what the right names for everything should be. At the moment, I think Challenge.Request might better be called Challenge.State. :) Maybe CAPTCHA_ON_DEMAND becomes CHALLENGE_NOT_RENDERED, CAPTCHA becomes CAPTCHA_RENDERED, PASSWORD becomes PASSWORD_RENDERED? Not sure. But, Oh, and one more thing. This basic technique -- encrypt some sort of state object, write it out as a hidden parameter to the form, then extract/decrypt on POST -- is something I gleaned from looking through the Stripes code. They do a lot of "state smuggling" as an alternative to storing server-side session attributes. I think it's a nice, low-overhead technique for situations like forms, which are essentially stateful. I use this technique also for smuggling the parameter names used for the spam tokens, for example. Long post! Hope it made sense. Andrew On Thu, Jan 7, 2010 at 3:26 AM, Janne Jalkanen <[email protected]> wrote: > > Errr... How do we determine what is a previous post? Spambots tend to make > each request from a different address and ignore cookies. Or is it so that > if the post is determined to contain spam, you get a redirect to the editor > page, but this time with a captcha? 'cos that would be really nice, since it > allows you to re-edit the content. > > /Janne > > On Jan 5, 2010, at 18:10 , Andrew Jaquith wrote: > >> Small correction (this is what happens when you type too quickly) -- >> >> CAPTCHAs are rendered, by default, ONLY if the previous post contains >> spam. The missing "only" makes all the difference. :) >> >> The important point is that we are treating spam, essentially, as a >> form validation error. >> >> If you don't submit spam, it won't produce a validation error, so you >> won't see a CAPTCHA. (Unless the JSP requires it, for example, when >> creating a user account). >> >> Andrew >> >> On Tue, Jan 5, 2010 at 10:46 AM, Andrew Jaquith >> <[email protected]> wrote: >>> >>> Hi all -- >>> >>> Just thought I'd send a quick update on CATPCHA. Janne and I have had >>> some back-channel conversations about enhancements that I needed to >>> make. >>> >>> Functionally, here's how the revised system will work: >>> >>> - CAPTCHAs will be rendered on the same page as the submitting form, >>> but by default if the previous post contains spam (this is in line >>> with Janne's comments) >>> - CAPTCHA-rendering will be the responsibility of the wiki:SpamProtect >>> tag (as before) >>> - wiki:SpamProtect must be added as a child of a form or stripes:form >>> element (as before) >>> - If the JSP author wishes, they may require a CAPTCHA by adding an >>> attribute challenge="captcha" to the SpamProtect tag (new) >>> - In addition, a form can require password confirmation by adding >>> attribute challenge="password" to the SpamProtect tag (new) >>> - All of the back-end processing will be done by SpamInterceptor, in >>> collaboration with the content-inspection system (as before) >>> - Stripes ActionBeans that require spam protection need only add a >>> @SpamProtect annotation to the target event methods (as before) >>> >>> We will add the SpamProtect tag to the page-edit form, comment form, >>> new user registration form, and user profile form. For new user >>> registration, a CAPTCHA will likely be required (challenge=captcha). >>> For user profile changes and post-install wiki configuration (coming >>> soon!), the user's password will be required to confirm >>> (challenge=password). >>> >>> So, that's the functional design -- nice and simple. And we knock out >>> some JIRA bugs while we're at it (e.g., confirm password for account >>> changes)... >>> >>> Andrew >>> > >
