On Thu, Apr 11, 2013 at 5:28 PM, David Rees <dree...@gmail.com> wrote:
> On Wed, Apr 10, 2013 at 12:49 PM, David Rees <dree...@gmail.com> wrote:
>> So it appears that the training is completely ineffective as I understand
>> that once an email as been marked as spam, it should no longer consider the
>> email for the whitelist.
>>
>> If you look at the dspam log or look at the dspam webui, the history page
>> seems to indicate that the email is indeed being retrained successfully.
>>
>> Now, if I take one of these emails, remove the dspam headers and train then
>> as an inoculation source, after retraining around 5 times the email will be
>> successfully marked as spam.
>>
>> How can I debug this issue further?
>
> So at this point, it's very clear that error-training is not working
> at all. Any hints before I start digging in to the code?

So I've done some digging into the code (time to take this to the -dev
list?) and it appears that training does do increment spam_hits for
some tokens, but it's not the right tokens.

Looking at the SQL generated, it appears that there's something wrong
with the token values that are pulled out of the signature, so instead
of training on the correct tokens used during
classification/processing, during error correction it's training on
different token values.

It looks like raw structs are encoded into the signature data field
and converted to long long unsigned values. The PostgreSQL backend
stores those long long unsigned values as bigint (signed). Somewhere
in between things are going wrong, but still not quite sure where yet.

-Dave

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to