There are a few fields that SpamFilter uses.
First, those inserted by SpamFilter.insertInputFields():
1) An UTF-8 detector. We put a single non-Latin1 character in the
form to make sure the input is not mangled by some badly behaving
robots or user clients. This used to be a major problem a few years
back, and this solved all of those "hey, my edit destroyed all UTF-8
characters" -problems. It also turns out many of the older bots just
assume a form is Latin1.
What is currently in plain.jsp (and should actually be moved to
insertInputFields()):
2) An input field with a random name. This means that a bot will need
to actually GET the form first and parse it out before it can send
syntactically correct POSTs. This is a LOT more effort than just
simply looking at the fields once and crafting your auto-poster to
conform.
3) A hidden input field which is meant to catch those bots which do a
GET and then randomly fill all fields with garbage. This field MUST
be empty when SpamFilter examines the contents of the POST. Since it's
hidden with the use of CSS, the bot would need to understand CSS to
bypass this one (and, the fieldname is also randomized in order to
prevent someone from hardcoding the fact that it needs to be empty).
The idea between 2 & 3 is that we've got two fields with random names,
one of which needs to be empty and one of which needs to be filled
with pre-determined data. This is quite hard for most bots to catch,
unless they are specifically crafted for JSPWiki and contain some
amount of logic to figure this one out. In the future we could also
do full input field name randomization and even random reordering of
the input fields to make it even more difficult.
(Once they've passed these simple tests, then the content-based
analysis starts. But these are sufficient to catch ~95% of all spam.)
/Janne
On Mar 4, 2009, at 07:46 , Andrew Jaquith wrote:
Janne,
Could you give me a little background on how the SpamFilter hash
fields work in 2.8? I've been able to replicate most of the behavior
of the plain editor as ActionBean event handlers & Stripe-ified JSPs,
but I haven't done so with the spam-filtering/hash fields just yet.
My primary reason for asking (other than trying to understand how it
works): I'm wondering if there's something we might be able to do here
that is related to CSRF prevention.
Andrew