>ASSP will extract the headers and body and perform
some checks to see if it already "saw" that file

Exactly this way it works for years now - I think we had this topic some 
months ago Andrea - how ever, good ideas comes back in mind every time! 
:):):). 
It must be - because if a mail is reported as ham, it is possibly already 
in the spam folder, but if reported we have to ignore it in the spam 
folder (and via vers spam->ham).
An MD5 hash is calculated over every mail body!

>slightly different content
HMM eliminates this problem!

>for example it may just process
>(consider) the headers
headers are simply too different in terms of bayes and HMM to get good 
results. There is no human language used except the subject. The rebuild 
retrieves some tags from there to get information for the user and/or 
domain based spamdb and hmmdb. How ever, if the body was already seen, 
also the header is ignored.

Thomas



Von:    Grayhat <[email protected]>
An:     [email protected], 
Datum:  11.09.2012 18:02
Betreff:        Re: [Assp-test] Antwort:  strange ASSP behavior



 
> I'll explain a bit more:
> 
> - all folders are processed : "the youngest files first"
> - both error folders are fully processed up to MaxFiles
> 
> As the result of processing the first two folders we get a weight 
> (spam/ham). Now we know were we are: we have a current weight, a
> wanted weight, and we now how many files are in the spam and notspam
> folders. Now assp calculates the maximum of files in the spam folder
> that could be apx. used , if we assume that at least all files in the
> notspam folder will be enougth to get the wanted target norm.
> The spam folder is processed.
> Now we know the new spam/ham weigth and can more exactly calculate,
> how many of the files in the notspam folder are required to reach the
> wanted target norm.
> 
> I'm expressed, how exact it was working in my case.

mumble (thinking loud); our problem (if we want to call it so) is that
we may have multiple spam/ham files with the same contents but
different headers or even with slightly different content... now, let's
leave the latter alone for the moment; let's try thinking about those
"similar" files (same body, different headers); in such a case we may
consider some mechanism so that, whenever (storing ? rebuilding ?)
processing them, ASSP will extract the headers and body and perform
some checks to see if it already "saw" that file (e.g. using a DB table
containing hashes or the like) and, if so, ASSP may just avoid
processing the whole "additional file"; for example it may just process
(consider) the headers and skip the body (since it already saw it); I'm
not sure it makes sense, again, I'm just thinking loud here...

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to