HMM may give less than 6 results, if the mail is too short, or a similar
was never seen.
Thomas
Von: Dossy Shiobara <do...@panoptic.com>
An: For Users of ASSP <assp-user@lists.sourceforge.net>
Datum: 30.01.2016 20:53
Betreff: Re: [Assp-user] HMM-Check has given less than 6 results -
using monitoring mode only
Okay, so... I'm going to include the entire snippet at the bottom of
this email, but I'm going to highlight sections here.
First:
Jan-30-16 12:05:01 [Worker_10001] File Count: 10,831
Jan-30-16 12:05:01 [Worker_10001] Processing... spam with 10,831 files
Jan-30-16 12:05:01 [Worker_10001] Ignore and remove files older than
Dec-30-15 12:05:01 in folder spam
Jan-30-16 12:15:13 [Worker_10001] Removed Old: 81
10 minutes to remove 81 old files? I'm guessing it's stat()'ing each
and every file in some terribly inefficient way, because:
$ time find . -mtime +31 -ls | wc -l
0
real 0m0.048s
user 0m0.009s
sys 0m0.041s
$ time find . -mtime +30 -ls | wc -l
80
real 0m0.046s
user 0m0.005s
sys 0m0.043s
find(1) needs less than 0.04s to find all 80 files that are older than
30 days. Can I turn off ASSP's expiration of old files and just cron a
find/rm script to do it, if ASSP is going to take 10 minutes?
Similarly, the scan of the notspam folder:
Jan-30-16 12:15:13 [Worker_10001] File Count: 6,917
Jan-30-16 12:15:13 [Worker_10001] Processing... notspam with 6,917 files
Jan-30-16 12:15:13 [Worker_10001] Ignore and remove files older than
Dec-30-15 12:15:13 in folder notspam
Jan-30-16 12:25:13 [Worker_10001] Removed Old: 34
10 minutes? Is there some kind of sleep() that's in there that makes
that step take 10 minutes regardless of the time it takes to process the
files? 10 minutes for 10,831 files and 10 minutes for 6,917 files ...
not some linear time-per-file duration, seems really strange.
And, I see:
Jan-30-16 12:28:14 [Worker_10001] Finished populating Hidden Markov
Model! HMM-check is now enabled again!
Yet, I still get those "HMM-Check has given less than 6 results"
errors. Is something else missing?
___ $ grep Worker_10001 logs/maillog.txt ___
Jan-30-16 12:05:00 [Worker_10001] Info: found module
/data/assp/lib/rebuildspamdb.pm version 7.26
Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB uses BerkeleyDB for
temporary hashes
Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB uses BerkeleyDB-ENV with
62.50 MByte
Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB-thread
rebuildspamdb-version 7.26 started in ASSP version 2.4.7(16004)
Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will create a Hidden
Markov Model
Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will create unicode
enabled databases
Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will process all words
as Sequence of UAX #29 Grapheme Clusters
Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will normalize unicode
characters
Jan-30-16 12:05:00 [Worker_10001] RebuildSpamDB will use the
ASSP_WordStem engine
Jan-30-16 12:05:00 [Worker_10001] Maxfiles: 14,000
Jan-30-16 12:05:00 [Worker_10001] RebuildFileTimeLimit: 1 5
Jan-30-16 12:05:00 [Worker_10001] RebuildFileTimeLimit: files will be
moved away from the corpus if their processing takes longer than 5
second(s)
Jan-30-16 12:05:00 [Worker_10001] /data/assp/errors/spam
Jan-30-16 12:05:00 [Worker_10001] File Count: 11
Jan-30-16 12:05:00 [Worker_10001] Processing... errors/spam with 11 files
Jan-30-16 12:05:00 [Worker_10001] Ignore and remove files older than
Sep-13-88 13:05:00 in folder errors/spam
Jan-30-16 12:05:00 [Worker_10001] Imported Files for HeloBlackList: 10
Jan-30-16 12:05:00 [Worker_10001] Imported Files for Bayes/HMM: 10
Jan-30-16 12:05:00 [Worker_10001] Finished in 1 second(s)
Jan-30-16 12:05:00 [Worker_10001] /data/assp/errors/notspam
Jan-30-16 12:05:00 [Worker_10001] File Count: 1
Jan-30-16 12:05:00 [Worker_10001] Processing... errors/notspam with 1
files
Jan-30-16 12:05:00 [Worker_10001] Ignore and remove files older than
Sep-13-88 13:05:00 in folder errors/notspam
Jan-30-16 12:05:00 [Worker_10001] Imported Files for HeloBlackList: 0
Jan-30-16 12:05:00 [Worker_10001] Imported Files for Bayes/HMM: 0
Jan-30-16 12:05:00 [Worker_10001] Finished in 1 second(s)
Jan-30-16 12:05:00 [Worker_10001] Info: corpusnorm after processing
errors/spam and errors/notspam is spamwords 8280/ hamwords 0 => 10.000
Jan-30-16 12:05:01 [Worker_10001] Info: require approx. 6,292 files
(3,152,789 words) from folder spam to get the wanted corpusnorm (1.000)
Jan-30-16 12:05:01 [Worker_10001] /data/assp/spam
Jan-30-16 12:05:01 [Worker_10001] File Count: 10,831
Jan-30-16 12:05:01 [Worker_10001] Processing... spam with 10,831 files
Jan-30-16 12:05:01 [Worker_10001] Ignore and remove files older than
Dec-30-15 12:05:01 in folder spam
Jan-30-16 12:15:13 [Worker_10001] Removed Old: 81
Jan-30-16 12:15:13 [Worker_10001] Imported Files for HeloBlackList:
10,750
Jan-30-16 12:15:13 [Worker_10001] Imported Files for Bayes/HMM: 6,338
Jan-30-16 12:15:13 [Worker_10001] Finished in 612 second(s)
Jan-30-16 12:15:13 [Worker_10001] Info: require approx. all files
(3,161,976 words) from folder notspam to get the wanted corpusnorm (1.000)
Jan-30-16 12:15:13 [Worker_10001] /data/assp/notspam
Jan-30-16 12:15:13 [Worker_10001] File Count: 6,917
Jan-30-16 12:15:13 [Worker_10001] Processing... notspam with 6,917 files
Jan-30-16 12:15:13 [Worker_10001] Ignore and remove files older than
Dec-30-15 12:15:13 in folder notspam
Jan-30-16 12:25:13 [Worker_10001] Removed Old: 34
Jan-30-16 12:25:13 [Worker_10001] Imported Files for HeloBlackList:
6,883
Jan-30-16 12:25:13 [Worker_10001] Imported Files for Bayes/HMM: 6,917
Jan-30-16 12:25:13 [Worker_10001] Finished in 600 second(s)
Jan-30-16 12:25:29 [Worker_10001] Populating 513541 Spamdb records -
Bayesian check is now disabled
Jan-30-16 12:25:29 [Worker_10001] Try to lock Spamdb database in 5
second(s)
Jan-30-16 12:25:42 [Worker_10001] Done - populating Spamdb records -
513541 - Bayesian check is now enabled
Jan-30-16 12:25:42 [Worker_10001] Bayesian Pairs: 513,541 now in list
Jan-30-16 12:25:42 [Worker_10001] Generating consolidated
Hidden-Markov-Model database from 3,740,686 record model
Jan-30-16 12:27:37 [Worker_10001] HMM sequences: 1,830,724 now in list
Jan-30-16 12:27:37 [Worker_10001] Generating Spamdb.helo records from
7,487 collected HELO's
Jan-30-16 12:27:37 [Worker_10001] Cleaning old Spamdb.helo records
Jan-30-16 12:27:37 [Worker_10001] Done - cleaning old Spamdb.helo records
Jan-30-16 12:27:37 [Worker_10001] HELO Blacklist: 1 new, 0 now in list
Jan-30-16 12:27:37 [Worker_10001] Try to lock HMM databases in 5 second(s)
Jan-30-16 12:27:42 [Worker_10001] Start populating Hidden Markov Model.
HMM-check is disabled for this time!
Jan-30-16 12:27:42 [Worker_10001] Start populating Hidden Markov Model
with 1,830,724 records!
Jan-30-16 12:28:14 [Worker_10001] Finished populating Hidden Markov
Model with 1,830,724 records!
Jan-30-16 12:28:14 [Worker_10001] Finished populating Hidden Markov
Model! HMM-check is now enabled again!
Jan-30-16 12:28:14 [Worker_10001] Total processing time: 1,394 second(s)
Jan-30-16 12:28:14 [Worker_10001] Total processed data: 116.19 MByte
Jan-30-16 12:28:14 [Worker_10001] Rebuild processed 14.53 files per
second.
Jan-30-16 12:28:14 [Worker_10001] After finishing the Rebuild process,
the /data/assp/tmpDB folder contains 899.74 MByte.
Jan-30-16 12:28:14 [Worker_10001] After finishing the Rebuild process,
the drive that contains the /data/assp/tmpDB folder has 1.11 GByte free
space from total 1.90 GByte.
Jan-30-16 12:28:14 [Worker_10001] Building new GripList records and
bounce report
Jan-30-16 12:28:14 [Worker_10001] Processing Logfile
/data/assp/logs/maillog.txt
Jan-30-16 12:28:14 [Worker_10001] Processing Logfile
/data/assp/logs/16-01-29.maillog.txt
Jan-30-16 12:28:15 [Worker_10001] Processing Logfile
/data/assp/logs/16-01-28.maillog.txt
Jan-30-16 12:28:15 [Worker_10001] Processing Logfile
/data/assp/logs/16-01-27.maillog.txt
Jan-30-16 12:28:16 [Worker_10001] Processing Logfile
/data/assp/logs/16-01-26.maillog.txt
Jan-30-16 12:28:16 [Worker_10001] Processing Logfile
/data/assp/logs/16-01-25.maillog.txt
Jan-30-16 12:28:16 [Worker_10001] Downloading griplist.conf via direct
HTTP connection
Jan-30-16 12:28:17 [Worker_10001] Griplist.conf already up to date
Jan-30-16 12:28:17 [Worker_10001] Info: loaded GRIPLIST upload and
download URL's from /data/assp/griplist.conf
Jan-30-16 12:28:18 [Worker_10001] Submitted 5,583 bytes: 0 IPv6
addresses, 619 IPv4 addresses
Jan-30-16 12:28:18 [Worker_10001] Trashlist was saved to
/data/assp/trashlist.db
On 1/30/16 6:42 AM, Alexandre de Arruda Paes wrote:
> I don't know if in BerkeleyDB the result is the same, but see my log
bellow.
>
>
> # grep Worker_10001 maillog.txt
>
>
> jan-30-16 02:42:53 [Worker_10001] Try to lock HMM databases in 5
second(s)
> jan-30-16 02:42:59 [Worker_10001] Start populating Hidden Markov Model.
> HMM-check is disabled for this time!
> jan-30-16 02:42:59 [Worker_10001] Start populating Hidden Markov Model
with
> 1.046.257 records!
> jan-30-16 02:42:59 [Worker_10001] Database import started for table
hmmdb
> jan-30-16 02:43:01 [Worker_10001] Trying Bulkimport for table hmmdb
> jan-30-16 02:43:01 [Worker_10001] Database: MySQL 5.5.47-cll
> jan-30-16 02:43:03 [Worker_10001] Added 1000 of 1046257 records for
table
> hmmdb - finished in 1045 sec
> jan-30-16 02:43:03 [Worker_10001] Added 2000 of 1046257 records for
table
> hmmdb - finished in 522 sec
> jan-30-16 02:43:03 [Worker_10001] Added 3000 of 1046257 records for
table
> hmmdb - finished in 347 sec
> jan-30-16 02:43:03 [Worker_10001] Added 4000 of 1046257 records for
table
> hmmdb - finished in 260 sec
> (...)
> jan-30-16 02:44:40 [Worker_10001] Added 1036000 of 1046257 records for
> table hmmdb - finished in 0 sec
> jan-30-16 02:44:44 [Worker_10001] Bulkimport for table hmmdb finished
> jan-30-16 02:44:44 [Worker_10001] Successfully added 1046257 records in
to
> table hmmdb
> jan-30-16 02:44:44 [Worker_10001] Finished populating Hidden Markov
Model
> with 1.046.257 records!
> jan-30-16 02:44:44 [Worker_10001] Finished populating Hidden Markov
Model!
> HMM-check is now enabled again!
>
>
>
>
>
> 2016-01-28 22:44 GMT-02:00 Dossy Shiobara <do...@panoptic.com>:
>
>> I am using BerkeleyDB. What does the log message string look like if
it
>> was transferred correctly so I can search for it?
>>
>>
>> On 1/28/16 5:30 PM, Alexandre de Arruda Paes wrote:
>>> If you use a database (like mysql), search in maillog if this records
was
>>> tranfered correctly after the rebuilddb terminate.
>>> Here, if this occurs, the message is the same as yours.
>> --
>> Dossy Shiobara | "He realized the fastest way to change
>> do...@panoptic.com | is to laugh at your own folly -- then you
>> http://panoptic.com/ | can let go and quickly move on." (p. 70)
>> * WordPress * jQuery * MySQL * Security * Business Continuity *
>>
>>
>>
>>
------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>> _______________________________________________
>> Assp-user mailing list
>> Assp-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/assp-user
>>
>
>
>
------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>
>
> _______________________________________________
> Assp-user mailing list
> Assp-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-user
--
Dossy Shiobara | "He realized the fastest way to change
do...@panoptic.com | is to laugh at your own folly -- then you
http://panoptic.com/ | can let go and quickly move on." (p. 70)
* WordPress * jQuery * MySQL * Security * Business Continuity *
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Assp-user mailing list
Assp-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-user
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Assp-user mailing list
Assp-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-user