Installation running the latest 2.6.3 (19169) build.
Perl:
This is perl 5, version 16, subversion 3 (v5.16.3) built for
x86_64-linux-thread-multi
(with 39 registered patches, see perl -V for more detail)

To try and address a number of inexplicable false-positives especially with
HMM, I populated the disclaimers file with all commonly-used disclaimers for
the company but a test message which pretty much contained only a disclaimer
was evaluated as spam by the Bayesian and HMM filters.  So I decided to take
a look at the database itself and confirmed that the disclaimer was still
there.

The disclaimer definition excerpt in question (starts from the beginning,
end clipped off):
# separated and added common strings at beginning
Apple Authorized Reseller
.
Apple Authorized Service Provider
.
Apple Authorized Premium Service Provider
.
Certified Members of the Apple Consultants Network
.
Macintosh Consulting, Service, Sales
.
Macintosh Consulting, Service, and Sales
.
Apple Certified Macintosh Technician
.
Apple Certified iOS Technician
.
CompanyName is an iPhone powered company
.
Find us at http://www.domain.mobi
.
John Doe,  President
CompanyName  Baltimore  |  Washington DC  |  Philadelphia
Apple Authorized Reseller
Apple Authorized Premium Service Provider
Certified Members of the Apple Consultants Network
1-866-COM-PANY  |  Twitter  |  Facebook
.
<snip>

The ASSP-generated regex excerpt (starts from the beginning, end clipped off
but I do show the two right-parentheses at the end of the file):
(?^u:[\b\s](?:apple authorizedi resel 
|appl authorized servic provid
|appl authorized premium servic provid
|certifi member appl consult network
|macintosh consult servic sale
|macintosh consult servic sale
|Apple Certified Macintosh Technician
.

|Apple Certified iOS Technician
.

|companyname is an iphon powered company
|find us href http www domain mobi
|john doe presid compan baltimor washington dc philadelphia appl author
resel appl author premium servic provid certifi member appl consult network
randnumb com pany twitter facebook
<snip>
))


The database lookup:
assp=# select * from hmmdb where pkey like '%iphone_powered%';
                             pkey                             |  pvalue   |
pfrozen
--------------------------------------------------------------+-----------+-
--------
 companyname\x1Cis\x1Can\x1Ciphone\x1Cpowered                   | 0.9999999
|       0
 iphone\x1Cpowered\x1Ccompany\x1Cfind\x1Cus                   | 0.9999999 |
0
 an\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind                   | 0.9999999 |
0
 @domain.com\x1Cis\x1Can\x1Ciphone\x1Cpowered\x1Ccompany   | 0.9999999 |
0
 @domain.com\x1Ccompanyname\x1Cis\x1Can\x1Ciphone\x1Cpowered | 0.9999999 |
0
 @domain.com\x1Can\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind | 0.9999999 |
0
 @domain.com\x1Ciphone\x1Cpowered\x1Ccompany\x1Cfind\x1Cus | 0.9999999 |
0
 is\x1Can\x1Ciphone\x1Cpowered\x1Ccompany                     | 0.9999999 |
0
(8 rows)


The log shows disclaimers being removed but clearly this one was not:
Jun-21-19 15:30:25 [Worker_10001] 453 attachment/image entries processed
Jun-21-19 15:30:25 [Worker_10001] Imported Files for HeloBlackList:
3,277
Jun-21-19 15:30:25 [Worker_10001] Imported Files for Bayes/HMM: 2,811
Jun-21-19 15:30:25 [Worker_10001] Disclaimer removed from       254 files
Jun-21-19 15:30:25 [Worker_10001] Finished in 136 seconds (24.10 files/s -
319.16 MByte)
Jun-21-19 15:30:30 [Worker_10001] Start populating Spamdb with 609,561
records - Bayesian check is now disabled!
Jun-21-19 15:30:30 [Worker_10001] Try to lock Spamdb database in 5 second(s)
Jun-21-19 15:30:35 [Worker_10001] Database import started for table spamdb
Jun-21-19 15:30:37 [Worker_10001] Trying Bulkimport for table spamdb
Jun-21-19 15:30:37 [Worker_10001] Database: PostgreSQL 09.02.2400
Jun-21-19 15:30:37 [Worker_10001] Info: version 2.4.3(15119) of file
/usr/share/assp/assp_db_import.cfg is used for the import

I tried separating/simplifying the list but to no avail.  I also noticed the
regex had truncated words in some but not all cases so I fixed that, also
removing the odd '.' line separators which appeared in the regex with no
rhyme or reason and then making the regex file unwriteable so ASSP couldn't
change it, but it didn't seem to make much difference.

Shouldn't the database NOT have the disclaimers listed as spammy entries or
am I looking at the wrong stuff here?

If you need more info or need me to try anything else, let me know.

Thanks!

Phil Quesinberry
Q Systems Engineering, Inc.
Embedded Systems, Telecom, IT
(410) 969-8002  Ext.102
http://www.qsystemsengineering.com <http://www.qsystemsengineering.com/> 



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
Assp-user mailing list
Assp-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to