Yeah yeah some of you will say "ah at last, it's been long". So, being in vacation till the end of the week, I could spend the day sorting out all the spam from the ML and using them to train bogofilter, after what I installed it.
The good point is that after a careful training using a database dedicated to the mailing list only, I managed to adjust it to reach zero false positive and letting just a few spam slip through. This means it could be used to filter out the mails instead of just tagging them. The dataset consisted in 30700 e-mails all delivered to this list over the last 9 years, 6600 of which are spam. The filter managed to catch 111 spams I had missed by hand and to spot 28 mails I had accidently tagged as spam. I'll intentionally let the ones marked "unsure" pass through, as the vast majority of them are valid e-mails ; only 53 were real spam over the last 9 years so we don't care as long as we just get a few tens a year. For now it only adds the "x-bogosity" header to the e-mail and still delivers it so that I can monitor the activity, but the purpose is to very quickly switch to dropping those marked as spam (which are the majority of those people complain about). I did a few configuration changes for this in the delivery path but nothing that should be visible except this new header. I'm just seeing the last spam marked as such, after a few other ones I'll configure it to block. If you notice that an e-mail from you seems to get blocked or to be bouncing, please do report it to me directly so that I can check what is happening. Cheers, Willy

