On Mon, July 2, 2012 22:57, Jari Fredriksson wrote: > On 2.7.2012 22:01, John Hardin wrote: >> On Mon, 2 Jul 2012, Jari Fredriksson wrote: >> >>> On 2.7.2012 19:23, [email protected] wrote: >>>> On 07/02, Jari Fredriksson wrote: >>>>> I follow the wiki page. I have now implemented the following >>>> >>>> It seems you are interpreting the wiki as a flawless authority, when >>>> it >>>> would probably be more appropriate to consider it a crufty guideline >>>> that >>>> one of us should get around to updating. >>>> >>>> http://wiki.apache.org/spamassassin/CorpusCleaning >>>> >>>> Which part of that page made you feel you should strip out facebook? >>>> >>> >>> http://wiki.apache.org/spamassassin/HandClassifiedCorpora?highlight=%28facebook%29 >>> >> >> That says to not include any _spams_ received via those channels, not to >> discard them _in toto_. > > Thanks! A good catch I guess, thanks for pointing out this failure in my > reading comprehension. But those spammy messages which opened this very > thread will be still removed, as they were in HAM corpus and rightly so. >
It's actually quite hard to remove those from SPAM, as they may and will have a forged linkedin.com or facebook.com Received-header. I have to manually check the damn spam really carefully.
