Yeah yeah some of you will say "ah at last, it's been long".
So, being in vacation till the end of the week, I could spend the day
sorting out all the spam from the ML and using them to train bogofilter,
after what I installed it.
The good point is that after a careful training using a database
dedicated to the mailing list only, I managed to adjust it to reach
zero false positive and letting just a few spam slip through. This
means it could be used to filter out the mails instead of just
The dataset consisted in 30700 e-mails all delivered to this list over
the last 9 years, 6600 of which are spam. The filter managed to catch
111 spams I had missed by hand and to spot 28 mails I had accidently
tagged as spam. I'll intentionally let the ones marked "unsure" pass
through, as the vast majority of them are valid e-mails ; only 53 were
real spam over the last 9 years so we don't care as long as we just get
a few tens a year.
For now it only adds the "x-bogosity" header to the e-mail and still
delivers it so that I can monitor the activity, but the purpose is to
very quickly switch to dropping those marked as spam (which are the
majority of those people complain about).
I did a few configuration changes for this in the delivery path but
nothing that should be visible except this new header. I'm just seeing
the last spam marked as such, after a few other ones I'll configure it
to block. If you notice that an e-mail from you seems to get blocked
or to be bouncing, please do report it to me directly so that I can
check what is happening.