On Thu, Apr 11, 2013 at 5:28 PM, David Rees <dree...@gmail.com> wrote: > On Wed, Apr 10, 2013 at 12:49 PM, David Rees <dree...@gmail.com> wrote: >> So it appears that the training is completely ineffective as I understand >> that once an email as been marked as spam, it should no longer consider the >> email for the whitelist. >> >> If you look at the dspam log or look at the dspam webui, the history page >> seems to indicate that the email is indeed being retrained successfully. >> >> Now, if I take one of these emails, remove the dspam headers and train then >> as an inoculation source, after retraining around 5 times the email will be >> successfully marked as spam. >> >> How can I debug this issue further? > > So at this point, it's very clear that error-training is not working > at all. Any hints before I start digging in to the code?
So I've done some digging into the code (time to take this to the -dev list?) and it appears that training does do increment spam_hits for some tokens, but it's not the right tokens. Looking at the SQL generated, it appears that there's something wrong with the token values that are pulled out of the signature, so instead of training on the correct tokens used during classification/processing, during error correction it's training on different token values. It looks like raw structs are encoded into the signature data field and converted to long long unsigned values. The PostgreSQL backend stores those long long unsigned values as bigint (signed). Somewhere in between things are going wrong, but still not quite sure where yet. -Dave ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user