Andrea,

your request was very logical. Why is assp not able to produce a fine 
corpusnorm/spamdb/HMM, if all information is available and the folders are 
full of files?
Had a sleepness night.
I think I've found a way to fix this.

After the error folders are processed, a temporary corpusnorm is 
calculated. The files in the spam and notspam folder are counted - and 
depending on the temp-corpusnorm, the spam-file-count and 
notspam-file-count, the apx. required count of spam files is calcuated.
If these spam files are finished processed - based on the needed notspam 
word count - the apx. required count of notspam files is calculated.

So (I hope), even if a machine gets too many or too less spams over a time 
, this logic will be able to ensure a fine corpusnorm.

Sep-11-12 11:25:36 c:/assp/errors/spam
Sep-11-12 11:25:36 File Count:  1,039
Sep-11-12 11:25:36 Processing... errors/spam with 1,039 files
Sep-11-12 11:25:37 ignore and remove files older than Dec-16-09 10:25:36 
in folder errors/spam
Sep-11-12 11:28:50 208 attachment/image entries processed
Sep-11-12 11:28:50 Imported Files:      1,037
Sep-11-12 11:28:50 Finished in 194 second(s)

Sep-11-12 11:28:50 c:/assp/errors/notspam
Sep-11-12 11:28:50 File Count:  553
Sep-11-12 11:28:50 Processing... errors/notspam with 553 files
Sep-11-12 11:28:52 ignore and remove files older than Dec-16-09 10:28:50 
in folder errors/notspam
Sep-11-12 11:30:42 96 attachment/image entries processed
Sep-11-12 11:30:42 Imported Files:      551
Sep-11-12 11:30:42 Finished in 112 second(s)
Sep-11-12 11:30:42 info: corpusnorm after processing errors/spam and 
errors/notspam is spamwords 1111618/ hamwords 1285508 => 0.864730518985491 

Sep-11-12 11:30:42 info: require 1858 files from folder spam to get a fine 
corpusnorm 

Sep-11-12 11:30:42 c:/assp/spam
Sep-11-12 11:30:42 File Count:  2,149
Sep-11-12 11:30:42 Processing... spam with 1,858 files
Sep-11-12 11:30:44 ignore and remove files older than Aug-25-12 11:30:42 
in folder spam
Sep-11-12 11:35:46 Removed Old: 5
Sep-11-12 11:35:46 36 attachment/image entries processed
Sep-11-12 11:35:46 Imported Files:      1,858
Sep-11-12 11:35:46 Finished in 304 second(s)
Sep-11-12 11:35:46 info: require 617 files from folder notspam to get a 
fine corpusnorm 

Sep-11-12 11:35:46 c:/assp/notspam
Sep-11-12 11:35:46 File Count:  599
Sep-11-12 11:35:46 Processing... notspam with 599 files
Sep-11-12 11:35:46 ignore and remove files older than Jul-06-12 11:35:46 
in folder notspam
Sep-11-12 11:37:02 66 attachment/image entries processed
Sep-11-12 11:37:02 Imported Files:      597
Sep-11-12 11:37:02 Finished in 76 second(s)
 ......

Sep-11-12 11:39:26 Spam Weight:    1,522,594
Sep-11-12 11:39:26 Not-Spam Weight:   1,499,397

Sep-11-12 11:39:26 Corpus norm: 1.0155 - (very good - balanced)
Sep-11-12 11:39:26 Corpus confidence:   1.00000000
 

The result of the last rebuild without this logic on the same corpus was

Sep-11-12 04:15:32 Spam Weight:             1,605,461
Sep-11-12 04:15:32 Not-Spam Weight:   1,498,629

Sep-11-12 04:15:32 Corpus norm:          1.0713 - (very good - balanced)
Sep-11-12 04:15:32 Corpus confidence:            1.00000000


Thomas






Von:    Grayhat <[email protected]>
An:     [email protected], 
Datum:  10.09.2012 13:14
Betreff:        [Assp-test] strange ASSP behavior




I'm running the latest ASSP 2.2.2 build 12248 (Win2k8, ActivePerl,
MSSQL), but I observed the same behavior with previous versions as
well; in short, if I manually "trim" the spam/notspam folders down to
14000 files (or less, but same count for both) and start a rebuild, the
rebuild report tells me that the corpus is ok (balanced) but then, if I
leave ASSP running for (say) a week or so (the box gets quite a bunch
of traffic), the spam folder keeps growing and growing and the corpus
quickly moves to "slight spam heavy" and then "too spam heavy" and the
ASSP "automatic cleanup" doesn't seem to help; the rebuild deletes some
"excess files" but the file count seem to be small if compared to the
amount of files stored; the relevant (or they should) parameters in my
config are:

MaxFiles: 14000

FilesDistribution: 1

MaxAllowedDups: 5

MaxBayesFileAge: 30 10

MaxKeepDeleted: 10

MaxCorrectedDays: 1000

MaxNoBayesFileAge: 15

autoCorrectCorpus: 0.6-1.4-4000-10

now... is this a bug or an expected behaviour ? And if it's expected,
what can I do (e.g. config change) to avoid this issue ?


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to