Thomas,

Agreed, but the reason I analyzed the database in the first place was because 
the Bayes/HMM output was picking up the disclaimers as being spammy, likely due 
to their being present in errors/spam reports.  It’s definitely much better now 
though and detecting as not spam.

I can certainly compose some fake mails to put into the corrected-notspam 
corpus if this again becomes a problem but just as an FYI, there is still some 
spammy Bayesian output related to the disclaimers, so the disclaimer removal 
process doesn’t seem to quite get them all:

Bayesian Analysis: - word stemming engine is used - language italian(text) 
detected
   <javascript:void(0);>
Bad Words       Bad Prob        Good Words      Good Prob
                helo pv50p00im-ztdgrandword.me  0.0370
                [addr] ssub     0.0370
                pv50p00im-ztdgrandword.me rcpt  0.0370
                sender [addr]   0.0370
                ssub test       0.1273
is an   0.7529
company find    0.7423
an iphon        0.7312
www domain      0.6286
us at   0.6267
at href 0.6164
powered company 0.6164
companyname is  0.6143
domain mob      0.6000
iphon powered   0.5942



HMM Analysis:
   <javascript:void(0);>
Bad Sequences   Bad Prob        Good Sequences  Good Prob
                rcpt [addr] sender [addr] ssub  0.0000 *
                [addr] sender [addr] ssub test  0.0000 *
                pv50p00im-ztdgrandword.me rcpt [addr] sender [addr]     0.0000 *
                helo pv50p00im-ztdgrandword.me rcpt [addr] sender       0.0000 *
powered company find us at      0.6000
company find us at href 0.6000



Bayesian Spam Probability: doubtful NOT SPAM
spam-probability:       3.2086685e-09
ham-probability:        1.6400847e-05
combined probability:   0.00019560 - got 15 - used 15 most significant results
answer/query relation:  71% of 21
bayesian confidence:    0.00000744
corpus confidence:      0.88889008

Values marked with an *, are irrelevant for the confidence calculation.


Hidden-Markov-Model Spam Probability: confident NOT SPAM
spam-probability:       3.6e-29
ham-probability:        0.15999994
combined probability:   0.00000000 - got 6 - used 6 most significant results    
HMM confidence: 0.01975311
answer/query relation:  33% of 18
corpus confidence:      0.88889008

Values marked with an *, are irrelevant for the confidence calculation.

HMM and Bayesian Log:

Jun-27-19 12:49:10 [Main_Thread] HMM Check [scoring] - Prob: 0.00000 - 
Confidence: 0.01975 => confident.ham - answer/query relation: 33% of 18


Let me take a moment to say though, that this is without a doubt, THE GREATEST 
SPAM FILTER EVER!!!  The large number of multiple checks and the 
configurability allow you to tweak the spammers into submission.  I’ve been 
making use of this project for years, first on Windows and then on Linux server 
platforms (we become wiser as we age), with results that are simply amazing.

THANK YOU!
And you too Fritz, may you Rest in Peace.

Phil Quesinberry
Q Systems Engineering, Inc.
Embedded Systems, Telecom, IT
(410) 969-8002  Ext.102
http://www.qsystemsengineering.com <http://www.qsystemsengineering.com/>

From: Thomas Eckardt
Sent: Sunday, June 23, 2019 3:54 AM
To: For Users of ASSP
Subject: Re: [Assp-user] Disclaimers not being removed?

These are spam entries (> 0.6). To correct them - put the content in to the 
correctednotspam folder.

Analyzing the database makes IMHO no sense. Instead analyze emails and check 
that there is no bayes and hmm output related to the disclaimer.

Thomas



Von:        "Phil Quesinberry" <pques...@qsystemsengineering.com>
An:        "'For Users of ASSP'" <assp-user@lists.sourceforge.net>
Datum:        23.06.2019 04:13
Betreff:        Re: [Assp-user] Disclaimers not being removed?


Thanks Thomas, for the info and explanation, that makes sense.

One question though, I’m trying to understand the difference between spammy and 
hammy entries in the database, so I did the following query:

assp=# select * from hmmdb where pkey like '%testosterone%';
                                        pkey                                    
     |  pvalue   | pfrozen
-------------------------------------------------------------------------------------+-----------+---------
 testosterone\x1Cand\x1Cfeel\x1Cstrong\x1Cssub                                  
     | 0.9999999 |       0
 @domain.com\x1Cfree\x1Ctestosterone\x1Cand\x1Cfeel\x1Cstrong                   
  | 0.9999999 |       0
 ssub\x1Cboost\x1Cfree\x1Ctestosterone\x1Cand                                   
     | 0.9999999 |       0
 @domain.com\x1Cssub\x1Cboost\x1Cfree\x1Ctestosterone\x1Cand                    
  | 0.9999999 |       0
 @domain.com\x1Cboost\x1Cfree\x1Ctestosterone\x1Cand\x1Cfeel                    
  | 0.9999999 |       0
 free\x1Ctestosterone\x1Cand\x1Cfeel\x1Cstrong                                  
     | 0.9999999 |       0
 
@domain.com\x1C98d6915738f9d2b8e981c34b\x1Cssub\x1Cboost\x1Cfree\x1Ctestosterone
 | 0.9999999 |       0
 boost\x1Cfree\x1Ctestosterone\x1Cand\x1Cfeel                                   
     | 0.9999999 |       0
 98d6915738f9d2b8e981c34b\x1Cssub\x1Cboost\x1Cfree\x1Ctestosterone              
     | 0.9999999 |       0
 @domain.com\x1Ctestosterone\x1Cand\x1Cfeel\x1Cstrong\x1Cssub                   
  | 0.9999999 |       0
(10 rows)

These spammy entries look identical to the disclaimers which you apparently 
were saying were corrected-notspam.  Sorry, I apparently don’t know enough Perl 
to figure out how the code is dealing with this, are these entries simply in a 
different section of the database or does each entry in fact contain enough 
info to identify whether it is a spam or ham word?  When I just dump the 
database, spam and ham entries appear to be together so it appears to be the 
latter.

Thanks again,

- Phil



Re: [Assp-user] Disclaimers not being removed? 
<https://sourceforge.net/p/assp/mailman/message/36699848/>
From: Thomas Eckardt  - 2019-06-22 07:16:34
Attachments: Message as HTML 
<https://sourceforge.net/p/assp/mailman/attachment/tITC.207679798c.OF3B47793A.C5717FD8-ONC1258421.0023541F-C1258421.0027F2FD%40thockar.com/1/>
>I also noticed the regex had truncated words in some but not all cases so 
I fixed that

ASSP_WordStem.pm is installed and used -> word stemming is done and
stop-words are removed. Any try to "fix" this, is wrong!
If the disclamer is not stemmed in the mail - another language was
detected for the mail. There is nothing you can (and should) fix.

The disclamer-definition and every mail are processed as follows:

- remove all special characters and spaces
- detect the language
- stem all words according to the detected language

Another way to make sure the disclamer is ignored by assp, is to compose
one or more faked mails, which contains only disclaimers (possibly
multiple times).
Put them in the oposit correction folder.

 companyname\x1Cis\x1Can\x1Ciphone\x1Cpowered                   |
0.9999999 |       0

(here this would be corrected-notspam)

Make sure the MD5 hash of the body is different in all these mails.
Remove the disclamer-definition.

The discalimer content will get a weight of 0.4<>0.6 and will not be
stored in the databases. Or it will get a weight <=0.4 and will be
detected as good.

Thomas


        Virus-free. www.avast.com 
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link>

_______________________________________________
Assp-user mailing list
Assp-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-user




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no known 
virus in this email!
*******************************************************


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
Assp-user mailing list
Assp-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to