>I also noticed the regex had truncated words in some but not all cases so I fixed that
ASSP_WordStem.pm is installed and used -> word stemming is done and stop-words are removed. Any try to "fix" this, is wrong! If the disclamer is not stemmed in the mail - another language was detected for the mail. There is nothing you can (and should) fix. The disclamer-definition and every mail are processed as follows: - remove all special characters and spaces - detect the language - stem all words according to the detected language Another way to make sure the disclamer is ignored by assp, is to compose one or more faked mails, which contains only disclaimers (possibly multiple times). Put them in the oposit correction folder. companyname\x1Cis\x1Can\x1Ciphone\x1Cpowered | 0.9999999 | 0 (here this would be corrected-notspam) Make sure the MD5 hash of the body is different in all these mails. Remove the disclamer-definition. The discalimer content will get a weight of 0.4<>0.6 and will not be stored in the databases. Or it will get a weight <=0.4 and will be detected as good. Thomas Von: "Phil Quesinberry" <pques...@qsystemsengineering.com> An: "'For Users of ASSP'" <assp-user@lists.sourceforge.net> Datum: 21.06.2019 22:16 Betreff: [Assp-user] Disclaimers not being removed? Installation running the latest 2.6.3 (19169) build. Perl: This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi (with 39 registered patches, see perl -V for more detail) To try and address a number of inexplicable false-positives especially with HMM, I populated the disclaimers file with all commonly-used disclaimers for the company but a test message which pretty much contained only a disclaimer was evaluated as spam by the Bayesian and HMM filters. So I decided to take a look at the database itself and confirmed that the disclaimer was still there. The disclaimer definition excerpt in question (starts from the beginning, end clipped off): # separated and added common strings at beginning Apple Authorized Reseller . Apple Authorized Service Provider . Apple Authorized Premium Service Provider . Certified Members of the Apple Consultants Network . Macintosh Consulting, Service, Sales . Macintosh Consulting, Service, and Sales . Apple Certified Macintosh Technician . Apple Certified iOS Technician . CompanyName is an iPhone powered company . Find us at http://www.domain.mobi . John Doe, President CompanyName Baltimore | Washington DC | Philadelphia Apple Authorized Reseller Apple Authorized Premium Service Provider Certified Members of the Apple Consultants Network 1-866-COM-PANY | Twitter | Facebook . <snip> The ASSP-generated regex excerpt (starts from the beginning, end clipped off but I do show the two right-parentheses at the end of the file): (?^u:[\b\s](?:apple authorizedi resel |appl authorized servic provid |appl authorized premium servic provid |certifi member appl consult network |macintosh consult servic sale |macintosh consult servic sale |Apple Certified Macintosh Technician . |Apple Certified iOS Technician . |companyname is an iphon powered company |find us href http www domain mobi |john doe presid compan baltimor washington dc philadelphia appl author resel appl author premium servic provid certifi member appl consult network randnumb com pany twitter facebook <snip> )) The database lookup: assp=# select * from hmmdb where pkey like '%iphone_powered%'; pkey | pvalue | pfrozen --------------------------------------------------------------+-----------+--------- companyname\x1Cis\x1Can\x1Ciphone\x1Cpowered | 0.9999999 | 0 iphone\x1Cpowered\x1Ccompany\x1Cfind\x1Cus | 0.9999999 | 0 an\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind | 0.9999999 | 0 @domain.com\x1Cis\x1Can\x1Ciphone\x1Cpowered\x1Ccompany | 0.9999999 | 0 @domain.com\x1Ccompanyname\x1Cis\x1Can\x1Ciphone\x1Cpowered | 0.9999999 | 0 @domain.com\x1Can\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind | 0.9999999 | 0 @domain.com\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind\x1Cus | 0.9999999 | 0 is\x1Can\x1Ciphone\x1Cpowered\x1Ccompany | 0.9999999 | 0 (8 rows) The log shows disclaimers being removed but clearly this one was not: Jun-21-19 15:30:25 [Worker_10001] 453 attachment/image entries processed Jun-21-19 15:30:25 [Worker_10001] Imported Files for HeloBlackList: 3,277 Jun-21-19 15:30:25 [Worker_10001] Imported Files for Bayes/HMM: 2,811 Jun-21-19 15:30:25 [Worker_10001] Disclaimer removed from 254 files Jun-21-19 15:30:25 [Worker_10001] Finished in 136 seconds (24.10 files/s - 319.16 MByte) Jun-21-19 15:30:30 [Worker_10001] Start populating Spamdb with 609,561 records - Bayesian check is now disabled! Jun-21-19 15:30:30 [Worker_10001] Try to lock Spamdb database in 5 second(s) Jun-21-19 15:30:35 [Worker_10001] Database import started for table spamdb Jun-21-19 15:30:37 [Worker_10001] Trying Bulkimport for table spamdb Jun-21-19 15:30:37 [Worker_10001] Database: PostgreSQL 09.02.2400 Jun-21-19 15:30:37 [Worker_10001] Info: version 2.4.3(15119) of file /usr/share/assp/assp_db_import.cfg is used for the import I tried separating/simplifying the list but to no avail. I also noticed the regex had truncated words in some but not all cases so I fixed that, also removing the odd ‘.’ line separators which appeared in the regex with no rhyme or reason and then making the regex file unwriteable so ASSP couldn’t change it, but it didn’t seem to make much difference. Shouldn’t the database NOT have the disclaimers listed as spammy entries or am I looking at the wrong stuff here? If you need more info or need me to try anything else, let me know. Thanks! Phil Quesinberry Q Systems Engineering, Inc. Embedded Systems, Telecom, IT (410) 969-8002 Ext.102 http://www.qsystemsengineering.com Virus-free. www.avast.com _______________________________________________ Assp-user mailing list Assp-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-user DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! *******************************************************
_______________________________________________ Assp-user mailing list Assp-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-user