Andrea,
your request was very logical. Why is assp not able to produce a fine
corpusnorm/spamdb/HMM, if all information is available and the folders are
full of files?
Had a sleepness night.
I think I've found a way to fix this.
After the error folders are processed, a temporary corpusnorm is
calculated. The files in the spam and notspam folder are counted - and
depending on the temp-corpusnorm, the spam-file-count and
notspam-file-count, the apx. required count of spam files is calcuated.
If these spam files are finished processed - based on the needed notspam
word count - the apx. required count of notspam files is calculated.
So (I hope), even if a machine gets too many or too less spams over a time
, this logic will be able to ensure a fine corpusnorm.
Sep-11-12 11:25:36 c:/assp/errors/spam
Sep-11-12 11:25:36 File Count: 1,039
Sep-11-12 11:25:36 Processing... errors/spam with 1,039 files
Sep-11-12 11:25:37 ignore and remove files older than Dec-16-09 10:25:36
in folder errors/spam
Sep-11-12 11:28:50 208 attachment/image entries processed
Sep-11-12 11:28:50 Imported Files: 1,037
Sep-11-12 11:28:50 Finished in 194 second(s)
Sep-11-12 11:28:50 c:/assp/errors/notspam
Sep-11-12 11:28:50 File Count: 553
Sep-11-12 11:28:50 Processing... errors/notspam with 553 files
Sep-11-12 11:28:52 ignore and remove files older than Dec-16-09 10:28:50
in folder errors/notspam
Sep-11-12 11:30:42 96 attachment/image entries processed
Sep-11-12 11:30:42 Imported Files: 551
Sep-11-12 11:30:42 Finished in 112 second(s)
Sep-11-12 11:30:42 info: corpusnorm after processing errors/spam and
errors/notspam is spamwords 1111618/ hamwords 1285508 => 0.864730518985491
Sep-11-12 11:30:42 info: require 1858 files from folder spam to get a fine
corpusnorm
Sep-11-12 11:30:42 c:/assp/spam
Sep-11-12 11:30:42 File Count: 2,149
Sep-11-12 11:30:42 Processing... spam with 1,858 files
Sep-11-12 11:30:44 ignore and remove files older than Aug-25-12 11:30:42
in folder spam
Sep-11-12 11:35:46 Removed Old: 5
Sep-11-12 11:35:46 36 attachment/image entries processed
Sep-11-12 11:35:46 Imported Files: 1,858
Sep-11-12 11:35:46 Finished in 304 second(s)
Sep-11-12 11:35:46 info: require 617 files from folder notspam to get a
fine corpusnorm
Sep-11-12 11:35:46 c:/assp/notspam
Sep-11-12 11:35:46 File Count: 599
Sep-11-12 11:35:46 Processing... notspam with 599 files
Sep-11-12 11:35:46 ignore and remove files older than Jul-06-12 11:35:46
in folder notspam
Sep-11-12 11:37:02 66 attachment/image entries processed
Sep-11-12 11:37:02 Imported Files: 597
Sep-11-12 11:37:02 Finished in 76 second(s)
......
Sep-11-12 11:39:26 Spam Weight: 1,522,594
Sep-11-12 11:39:26 Not-Spam Weight: 1,499,397
Sep-11-12 11:39:26 Corpus norm: 1.0155 - (very good - balanced)
Sep-11-12 11:39:26 Corpus confidence: 1.00000000
The result of the last rebuild without this logic on the same corpus was
Sep-11-12 04:15:32 Spam Weight: 1,605,461
Sep-11-12 04:15:32 Not-Spam Weight: 1,498,629
Sep-11-12 04:15:32 Corpus norm: 1.0713 - (very good - balanced)
Sep-11-12 04:15:32 Corpus confidence: 1.00000000
Thomas
Von: Grayhat <[email protected]>
An: [email protected],
Datum: 10.09.2012 13:14
Betreff: [Assp-test] strange ASSP behavior
I'm running the latest ASSP 2.2.2 build 12248 (Win2k8, ActivePerl,
MSSQL), but I observed the same behavior with previous versions as
well; in short, if I manually "trim" the spam/notspam folders down to
14000 files (or less, but same count for both) and start a rebuild, the
rebuild report tells me that the corpus is ok (balanced) but then, if I
leave ASSP running for (say) a week or so (the box gets quite a bunch
of traffic), the spam folder keeps growing and growing and the corpus
quickly moves to "slight spam heavy" and then "too spam heavy" and the
ASSP "automatic cleanup" doesn't seem to help; the rebuild deletes some
"excess files" but the file count seem to be small if compared to the
amount of files stored; the relevant (or they should) parameters in my
config are:
MaxFiles: 14000
FilesDistribution: 1
MaxAllowedDups: 5
MaxBayesFileAge: 30 10
MaxKeepDeleted: 10
MaxCorrectedDays: 1000
MaxNoBayesFileAge: 15
autoCorrectCorpus: 0.6-1.4-4000-10
now... is this a bug or an expected behaviour ? And if it's expected,
what can I do (e.g. config change) to avoid this issue ?
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test