Dear clapf-users,

clapf has a performance saving option to skip spam checking if a message
is bigger than a certain limit ('max_message_size_to_filter') to spare
the query of lots of tokens. Since spammers usually send small size (few
kBs) spam, it should fit most of the time.

But what about if you get a spam with a binary attachment (such as gif,
pdf, mp3, ...)? Though it has a few hundred tokens only, but clapf will
skip the spam check and it will land in your Inbox.

Now you have two options:

Option "A": increase the value of 'max_message_size_to_filter' to a
suitable size.

Option "B": leave 'max_message_size_to_filter' at its default 64kB
value, and use the latest nightly build. I have introduced a new
configuration variable called 'max_number_of_tokens_to_filter'.

If a message is bigger than 'max_message_size_to_filter' then clapf
checks if it has less tokens than 'max_number_of_tokens_to_filter'. If
so, than clapf forces itself to check the message even if it's big, but
has little tokens.

I did a quick research on my spam collection, and I found that non of
the 50-82 kB size spam with one gif/jpeg/pdf attachment has more than
520 tokens.

By default max_number_of_tokens_to_filter=2000, so it should suffice. Of
course you can fine tune this parameter, too to suit your needs.

If you choose option "B" then take care of the upgrade process, and read
the UPGRADE file. In doubt just ask.

Best regards,
Janos

Reply via email to