Installation running the latest 2.6.3 (19169) build. Perl: This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi (with 39 registered patches, see perl -V for more detail)
To try and address a number of inexplicable false-positives especially with HMM, I populated the disclaimers file with all commonly-used disclaimers for the company but a test message which pretty much contained only a disclaimer was evaluated as spam by the Bayesian and HMM filters. So I decided to take a look at the database itself and confirmed that the disclaimer was still there. The disclaimer definition excerpt in question (starts from the beginning, end clipped off): # separated and added common strings at beginning Apple Authorized Reseller . Apple Authorized Service Provider . Apple Authorized Premium Service Provider . Certified Members of the Apple Consultants Network . Macintosh Consulting, Service, Sales . Macintosh Consulting, Service, and Sales . Apple Certified Macintosh Technician . Apple Certified iOS Technician . CompanyName is an iPhone powered company . Find us at http://www.domain.mobi . John Doe, President CompanyName Baltimore | Washington DC | Philadelphia Apple Authorized Reseller Apple Authorized Premium Service Provider Certified Members of the Apple Consultants Network 1-866-COM-PANY | Twitter | Facebook . <snip> The ASSP-generated regex excerpt (starts from the beginning, end clipped off but I do show the two right-parentheses at the end of the file): (?^u:[\b\s](?:apple authorizedi resel |appl authorized servic provid |appl authorized premium servic provid |certifi member appl consult network |macintosh consult servic sale |macintosh consult servic sale |Apple Certified Macintosh Technician . |Apple Certified iOS Technician . |companyname is an iphon powered company |find us href http www domain mobi |john doe presid compan baltimor washington dc philadelphia appl author resel appl author premium servic provid certifi member appl consult network randnumb com pany twitter facebook <snip> )) The database lookup: assp=# select * from hmmdb where pkey like '%iphone_powered%'; pkey | pvalue | pfrozen --------------------------------------------------------------+-----------+- -------- companyname\x1Cis\x1Can\x1Ciphone\x1Cpowered | 0.9999999 | 0 iphone\x1Cpowered\x1Ccompany\x1Cfind\x1Cus | 0.9999999 | 0 an\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind | 0.9999999 | 0 @domain.com\x1Cis\x1Can\x1Ciphone\x1Cpowered\x1Ccompany | 0.9999999 | 0 @domain.com\x1Ccompanyname\x1Cis\x1Can\x1Ciphone\x1Cpowered | 0.9999999 | 0 @domain.com\x1Can\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind | 0.9999999 | 0 @domain.com\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind\x1Cus | 0.9999999 | 0 is\x1Can\x1Ciphone\x1Cpowered\x1Ccompany | 0.9999999 | 0 (8 rows) The log shows disclaimers being removed but clearly this one was not: Jun-21-19 15:30:25 [Worker_10001] 453 attachment/image entries processed Jun-21-19 15:30:25 [Worker_10001] Imported Files for HeloBlackList: 3,277 Jun-21-19 15:30:25 [Worker_10001] Imported Files for Bayes/HMM: 2,811 Jun-21-19 15:30:25 [Worker_10001] Disclaimer removed from 254 files Jun-21-19 15:30:25 [Worker_10001] Finished in 136 seconds (24.10 files/s - 319.16 MByte) Jun-21-19 15:30:30 [Worker_10001] Start populating Spamdb with 609,561 records - Bayesian check is now disabled! Jun-21-19 15:30:30 [Worker_10001] Try to lock Spamdb database in 5 second(s) Jun-21-19 15:30:35 [Worker_10001] Database import started for table spamdb Jun-21-19 15:30:37 [Worker_10001] Trying Bulkimport for table spamdb Jun-21-19 15:30:37 [Worker_10001] Database: PostgreSQL 09.02.2400 Jun-21-19 15:30:37 [Worker_10001] Info: version 2.4.3(15119) of file /usr/share/assp/assp_db_import.cfg is used for the import I tried separating/simplifying the list but to no avail. I also noticed the regex had truncated words in some but not all cases so I fixed that, also removing the odd '.' line separators which appeared in the regex with no rhyme or reason and then making the regex file unwriteable so ASSP couldn't change it, but it didn't seem to make much difference. Shouldn't the database NOT have the disclaimers listed as spammy entries or am I looking at the wrong stuff here? If you need more info or need me to try anything else, let me know. Thanks! Phil Quesinberry Q Systems Engineering, Inc. Embedded Systems, Telecom, IT (410) 969-8002 Ext.102 http://www.qsystemsengineering.com <http://www.qsystemsengineering.com/> --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
_______________________________________________ Assp-user mailing list Assp-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-user