>I also noticed the regex had truncated words in some but not all cases so 
I fixed that

ASSP_WordStem.pm is installed and used -> word stemming is done and 
stop-words are removed. Any try to "fix" this, is wrong!
If the disclamer is not stemmed in the mail - another language was 
detected for the mail. There is nothing you can (and should) fix.

The disclamer-definition and every mail are processed as follows:

- remove all special characters and spaces
- detect the language
- stem all words according to the detected language

Another way to make sure the disclamer is ignored by assp, is to compose 
one or more faked mails, which contains only disclaimers (possibly 
multiple times).
Put them in the oposit correction folder. 

 companyname\x1Cis\x1Can\x1Ciphone\x1Cpowered                   | 
0.9999999 |       0

(here this would be corrected-notspam)

Make sure the MD5 hash of the body is different in all these mails.
Remove the disclamer-definition.

The discalimer content will get a weight of 0.4<>0.6 and will not be 
stored in the databases. Or it will get a weight <=0.4 and will be 
detected as good.

Thomas



Von:    "Phil Quesinberry" <pques...@qsystemsengineering.com>
An:     "'For Users of ASSP'" <assp-user@lists.sourceforge.net>
Datum:  21.06.2019 22:16
Betreff:        [Assp-user] Disclaimers not being removed?



Installation running the latest 2.6.3 (19169) build.
Perl:
This is perl 5, version 16, subversion 3 (v5.16.3) built for 
x86_64-linux-thread-multi
(with 39 registered patches, see perl -V for more detail)
To try and address a number of inexplicable false-positives especially 
with HMM, I populated the disclaimers file with all commonly-used 
disclaimers for the company but a test message which pretty much contained 
only a disclaimer was evaluated as spam by the Bayesian and HMM filters. 
So I decided to take a look at the database itself and confirmed that the 
disclaimer was still there.
The disclaimer definition excerpt in question (starts from the beginning, 
end clipped off):
# separated and added common strings at beginning
Apple Authorized Reseller
.
Apple Authorized Service Provider
.
Apple Authorized Premium Service Provider
.
Certified Members of the Apple Consultants Network
.
Macintosh Consulting, Service, Sales
.
Macintosh Consulting, Service, and Sales
.
Apple Certified Macintosh Technician
.
Apple Certified iOS Technician
.
CompanyName is an iPhone powered company
.
Find us at http://www.domain.mobi
.
John Doe,  President
CompanyName  Baltimore  |  Washington DC  |  Philadelphia
Apple Authorized Reseller
Apple Authorized Premium Service Provider
Certified Members of the Apple Consultants Network
1-866-COM-PANY  |  Twitter  |  Facebook
.
<snip>
The ASSP-generated regex excerpt (starts from the beginning, end clipped 
off but I do show the two right-parentheses at the end of the file):
(?^u:[\b\s](?:apple authorizedi resel 
|appl authorized servic provid
|appl authorized premium servic provid
|certifi member appl consult network
|macintosh consult servic sale
|macintosh consult servic sale
|Apple Certified Macintosh Technician
.
|Apple Certified iOS Technician
.
|companyname is an iphon powered company
|find us href http www domain mobi
|john doe presid compan baltimor washington dc philadelphia appl author 
resel appl author premium servic provid certifi member appl consult 
network randnumb com pany twitter facebook
<snip>
))
The database lookup:
assp=# select * from hmmdb where pkey like '%iphone_powered%';
                             pkey                             |  pvalue | 
pfrozen
--------------------------------------------------------------+-----------+---------
 companyname\x1Cis\x1Can\x1Ciphone\x1Cpowered                   | 
0.9999999 |       0
 iphone\x1Cpowered\x1Ccompany\x1Cfind\x1Cus                   | 0.9999999 
|       0
 an\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind                   | 0.9999999 
|       0
 @domain.com\x1Cis\x1Can\x1Ciphone\x1Cpowered\x1Ccompany   | 0.9999999 |   
 0
 @domain.com\x1Ccompanyname\x1Cis\x1Can\x1Ciphone\x1Cpowered | 0.9999999 | 
      0
 @domain.com\x1Can\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind | 0.9999999 |   
 0
 @domain.com\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind\x1Cus | 0.9999999 |   
 0
 is\x1Can\x1Ciphone\x1Cpowered\x1Ccompany                     | 0.9999999 
|       0
(8 rows)
The log shows disclaimers being removed but clearly this one was not:
Jun-21-19 15:30:25 [Worker_10001] 453 attachment/image entries processed
Jun-21-19 15:30:25 [Worker_10001] Imported Files for HeloBlackList: 3,277
Jun-21-19 15:30:25 [Worker_10001] Imported Files for Bayes/HMM: 2,811
Jun-21-19 15:30:25 [Worker_10001] Disclaimer removed from       254 files
Jun-21-19 15:30:25 [Worker_10001] Finished in 136 seconds (24.10 files/s - 
319.16 MByte)
Jun-21-19 15:30:30 [Worker_10001] Start populating Spamdb with 609,561 
records - Bayesian check is now disabled!
Jun-21-19 15:30:30 [Worker_10001] Try to lock Spamdb database in 5 
second(s)
Jun-21-19 15:30:35 [Worker_10001] Database import started for table spamdb
Jun-21-19 15:30:37 [Worker_10001] Trying Bulkimport for table spamdb
Jun-21-19 15:30:37 [Worker_10001] Database: PostgreSQL 09.02.2400
Jun-21-19 15:30:37 [Worker_10001] Info: version 2.4.3(15119) of file 
/usr/share/assp/assp_db_import.cfg is used for the import
I tried separating/simplifying the list but to no avail.  I also noticed 
the regex had truncated words in some but not all cases so I fixed that, 
also removing the odd ‘.’ line separators which appeared in the regex with 
no rhyme or reason and then making the regex file unwriteable so ASSP 
couldn’t change it, but it didn’t seem to make much difference.
Shouldn’t the database NOT have the disclaimers listed as spammy entries 
or am I looking at the wrong stuff here?
If you need more info or need me to try anything else, let me know.
Thanks!
Phil Quesinberry
Q Systems Engineering, Inc.
Embedded Systems, Telecom, IT
(410) 969-8002  Ext.102
http://www.qsystemsengineering.com


Virus-free. www.avast.com 
_______________________________________________
Assp-user mailing list
Assp-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-user




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


_______________________________________________
Assp-user mailing list
Assp-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to