Can I just take the names of the rules?
e.g. at least two checks should fire:
meta MULTIPLE_TESTS (( RAZOR2_CF_RANGE_51_100 + RAZOR2_CHECK +
URIBL_ABUSE_SURBL) > 1)
score MULTIPLE_TESTS 1
found in
X-Spam-Status: No, score=5.908 tagged_above=2 required=6.31
tests=[DKIM_SIGNED=0.1,
Low-score tests are neither spam nor ham signs by themselves. They can be
used in metas in conjunction with other indicators to help determine ham or
spam. A zero value indicates that a rule didn't hit and the sign is not
present. A small score indicates that the rule did hit, so the sign it is
Obviously the right way is for the master rules to be adjusted. But if you want
a local fix, try something like this:
score RCVD_IN_DNSWL_HI -0.001
metaMY_RCVD_IN_DNSWL_HIRCVD_IN_DNSWL_HI && !SPF_FAIL
score MY_RCVD_IN_DNSWL_HI-5
describeMY_RCVD_IN_DNSWL_HI
I haven't had a chance yet to read this thread carefully, but spamd when
run as root in tests will, at least in some cases, set itself to run as
user "nobody". If you do that in a subdirectory of your non-nobody user's
HOME, the usual permission configuration will not provide read access to
SpamAssassin cannot block or eliminate spam. It does not have the facilities to
do that. SA can only score potential spam.
Whatever method you used to glue SA into your mail path needs to parse the
score SA assigned in the returned mail, and do whatever routing it thinks is
appropriate.
We
I've blocked him on my mail server, as well.
Reindl now and then says something useful, but as you have noticed his
people skills are somewhere in the negative 200 score level. I don't know
that I'd block him, but you do need to take anything he says witha few
horselicks of salt.
> header __FROM_THOMAS_1 From =~ //i
You can simplify this. The parenthesized grouping was only necessary when there
was more than one possible string, in my case .com and .net. Since you only
have .com you can remove the (:? and ) and make the regex a little more
efficient:
> header
> Am I correct? Sorry if I'm being dense. I'm just a sysadmin, not a developer,
> so I'm not super clear on how macros and expansions work in perl.
You have the concepts right. I'd try the rules you posted and see if they seem
to be producing correct results. You can run a spam thru SA with the
I am suddenly getting hammered by a BUNCH of spam that appears to be from
me. It scores low, and even though I keep feeding it to Bayes, it's still
not hitting the threshold to be marked as spam.
When I check the headers, it's coming from multiple random email servers,
but many appear to
I've patched spamass milter to let any previously added "X-Spam"
headers untouched
Its generally considered bad practice to pass thru X-Spam headers from an
unkonwn source.
Like most anything else in an email header, a spammer could inject his own
headers, probably populated with items
> meta FROM_CLIENT_TEST from FROM_CLIENT_EMAIL && FROM_CLIENT_IP
Is that a typo when you were making this mail, or is it actually how the line
is coded? There is an extra "from" there.
Even if you fix that, you won't get the results you expect. Both
FROM_CLIENT_EMAIL and FROM_CLIENT_IP will
This is not an area I know anything about, so I may be completely wrong.
That said, I seem to remember a conversation very like this some years back.
If I remember correctly, someone found some switch that could be set to get
spamass-milter to add the Received header before calling the other
But I was more interested if SA already has something like that?
It does not.
Weren't there a whole set of "FUZZY" rules once? I'm pretty sure that they
looked for words in in the subject and maybe body of the email that had
exactly this sort of obfuscation. I don't think they were applied
I get a lot of spams, and a major characterisitc is they only have
text/plain that is base-64 encoded.
Since I live in an area where base-64 encoding is basically never necessary,
almost all base-64 encoded text parts are major spam signs.
Content-Type: text/plain; charset=utf-8
From: "Bill Cole"
It is my understanding that an automated rescoring job was run quite some
time ago (before I was on the PMC) to generate the Bayes scores, which
determined that to be the best supplemental score to give to the greater
certainty.
I was around in those days. My memory isn't
From: "Reindl Harald"
in other words a system for morons - morons which will drag mails to spam
instead click on "unsubscribe"
per-user bayes don't work well, never
Well Harald, you are certainly welcome to your opinion. It would be nicer if
you had kept it yourself though.
The system
the new spam and ham (respectively) get merged into
these folders after learning, and removed from the current Spam and Ham folders.
- Original Message -
From: Michael Grant
To: users@spamassassin.apache.org ; Loren Wilton ; hg user
Sent: Monday, February 20, 2023 12:47 PM
> Can you please give me some details on your bayes setup?
> Headers exclusion, bayes_token_sources, how do you "sa-learn" messages...
Standard options on Bayes. No autolearn. A cron job that will harvest Spam and
Ham mboxes and feed them to sa-learn once a day, then archive the learned
> The real question is: has bayes still its use case in 2023 ? Is it still used
> with important scores or just to flag messages for a review?
It works fine for me here.
They receive wildly different BAYES scores.
* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
* [score: 0.0002]
* 2.2 BAYES_20 BODY: Bayes spam probability is 5 to 20%
* [score: 0.0881]
This looks like you have per-user Bayes databases, and the messaage type has
been trained
I started seeing some spam today in the 1-1.5 MB range.
It's been over a year now, but for a while I was getting a huge number of
spams that were either 1143 KB or 3831 KB.
The 3831 KB variant used the same obfuscation payload as the 1143 KB spams,
they just put it in twice in a row.
Have some annoying SPAM that consistently shows a negative score on BAYES.
Is the default scoring or influenced by BAYES in some way?
*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
* [score: 0.]
The score is reasonable for guaranteed ham, which is what your Bayes thinks
this
I believe 3MB is above the default scan size for SA, so likely it won't even
look at the file.
Loren
- Original Message -
From: Rupert Gallagher
To: users@spamassassin.apache.org
Sent: Tuesday, February 07, 2023 2:26 AM
Subject: Re: New rule wanted
Note: Both
> header TO_SPECIFIC_DOMAIN To:addr =~ /\@(test\.com|test\.net)$/
That for efficiency really should use a non-capturing grouping:
header TO_SPECIFIC_DOMAIN To:addr =~ /\@(?:test\.com|test\.net)$/
Note the "?:" after the left parend.
Loren
Why not do a simple rule rather than inventing some Perl code?
header TO_SPECIFIC_EMAIL To:addr ~=
'(?:\bus...@example.com|\bus...@example.com|\bus...@example.com)'
describe TO_SPECIFIC_EMAIL Mail to a specific email address
score TO_SPECIFIC_EMAIL -2
header TO_SPECIFIC_DOMAIN To:addr
You can simplify your rule code a little if you want:
header __LOCAL_FROM_BE From =~ /.\.beauty/i
meta LOCAL_BE (__LOCAL_FROM_BE)
score LOCAL_BE 2
describe LOCAL_BE from beauty domain
to
header LOCAL_BE From =~ /.\.beauty/i
score LOCAL_BE 2
describe LOCAL_BE from beauty domain
The
If this is on 4.0, perhaps a bug should be opened.
- Original Message -
From: Shawn Iverson
To: SA Mailing list
Sent: Wednesday, December 21, 2022 10:05 AM
Subject: SA build from cpan fails under certain conditions
Hello SA Users,
Just posting this in case anyone else
Personally I'd look at why BIGNUM_EMAILS_MANY is hitting and see if there is
something the sender could do to avoid it. I'm pretty sure I've never seen that
rule hit in any of my spam, so it must be something a bit unique.
Loren
> body__ANIMALS/cat|mouse|bird|dog/i
There is a possible problem with your rule. It probably isn't related to what
you are seeing, but could be a problem for you anyway.
There is no word boundry in the regex, so 'cat' will match catamaran, 'mouse'
will match mousehouse, 'bird' will
So the alternative is adding a header and move it to the spam folder
automatically on the basis of the header?
Currently I just want to 'warn' users that the message is possible spam,
they can decide to move such emails automatically to a spam folder by
enabling a sieve rule.
What would be an
Pretty obviously a spam, I'm surprized that it didn't get a lot of "fake order"
type of points.
Here is the (or at least one) double URL that it caught:
;" href=3D"ht=
tps://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fsojaprote=
in.rs
I'm getting a bunch of spams from fake gmail accounts that consist of one
short line of text and a 2 MB jpg file.
The subject and body text are pretty much random beyond that.
How do I check for the following?
--e345f305ea2680cd
Content-Type: image/jpeg; name="MMM.jpg"
It sure seems to me like people are just using email to share pictures
(licenses, l
egal docs, as well as pictures of the kids.)
Are these messages that are being sent by individuals from their phones?
Or is there some program that is sending these?
I can see it being too much work to type a
> The problem I'm having is that my To header rules aren't matching because
> there is no To header,
> and I'm otherwise unsure what to match on. The only occurrence of the
> recipient in the entire email
> is in that Received header.
>
> It does match on "ALL", but I think I need to be more
header __HDRS_MISSP ALL:raw =~ /^(?:Subject|From|To|Reply-To):\S/ism
That rule just says: look at all the raw header data and match if there's
none
of Subject, From, To, Reply-To entries.
IE a really malformed message.
Hum. As I read it, that is "headers misspelled" (not "headers missing")
Minicomputers-Exhume: sides
Malthus-Films: 88976dea
Parasitic-Homogeneity: db5da28ba3e69a
Capitalizations-Grievously: oilers
It looks like the pattern is
/[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}/
or something close to that.
Obviously it can mutate, but generally these are
Fascinating thread I just stumbled on. Yes, in early parts of the phone
system, the letters were geographic and referenced the street for where
the central office was located switching those calls. For example, in
Arlington VA, my grandfathers number was 533-9389 which was referred to as
Is there a tool I can use to do a manual lint of the local.cf file ?
At command prompt:
spamassassin --lint
Loren
What is the purpose of the rule named T_SCC_BODY_TEXT_LINE? On my servers,
it hits nearly every spam and ham email.
Rules beginning with T_ are test rules, and should have a very small score.
So someone is testing some concept there. I don't seem to have this rule, so
I can't say any more
Just off the top of my head:
rawbodyONEDRIVE_DOWNLOADm'https://onedrive\.live\.com/download[?]cid='
score ONEDRIVE_DOWNLOAD0.5
describeONEDRIVE_DOWNLOADDownload link to a file on Onedrive
Personally I'd be inclined to put an i on the end of that.
body
Cian is rumored to have said:
Anne, I am incredibly grateful for the offer. I sent my emails to the
tester and to the support email. Hopefully, they come up with
something actionable.
If you get a useful result it might be nice to summarize it to the list.
Loren
Are you talking about the use of m'' as the regex delimiter?
Yes.
It will probably work just fine for the foreseeable future, as long as the
input validation of rules files is lenient.
I think you may have a very hard time removing the m matching
delimiters from SA. I suspect there are at
No, I added that after observing multiple spams with random garbage after
the closing HTML tag in the HTML body part. Presumably it was an attempt
at Bayes poison, checksum avoidance, or some other filter evasion
technique.
I'll tighten it up.
FWIW, here is the rule I use. It obviously
But, it had:
* 2.5 CONTENT_AFTER_HTML More content after HTML close tag
but one was only text/plain and I could see nothing wrong. reading
72_active.cf I found:
rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i
>
which fires on a text/plain part that discusses html
If they are more than a month or so old, just drop them would probably be
appropriate. Spam changes with time, and learning old spam patterns may not
do you much good.
If you aren't running Bayes, just dump all of them.
If you are running Bayes, it might be worth running the lst month or so
that header should be on same host as the email clients read there mails,
if its trusted outside of local mta, then its forged say X-Spam-Flag: NO
do we want to trust it ?
I have a somewhat similar situation where the mail provider for my personal
account runs filtering software that usually
How do I do this? There is no rawheader or rawbody matcher as far as I
could determine.
There is 'rawbody', but it may or may not help you. I seem to recall the
Subject is prepended to the body text, but I don't recall if it is prepended
to rawbody. You could try it.
Short of that, you may
So how is this score arrived at?
I believe that scores of 0.001 are generally manually set, and not intended
to be anything other than a visible marker that the rule hit. That is
probably the case here.
Loren
What would be helpful here would be logging of when a rule *starts*
evaluation. Normally that would be painful, but for tracking a runaway it
would be useful. Perhaps I can code up something to capture that and log
it on a timeout...
Actually what sounds like it would be useful would be
I have to admit I'd never paid much attention to the RCVD_IN_DNSWL_* scores
on spam before.
Looking at spam for last month, I don't have a single RCVD_IN_DNSWL_MED.
But I do have 12 pretty blatent spams that hit RCVD_IN_DNSWL_HI.
It makes me wonder just how useful a rule it is.
Especially when
In v4.x, Unicode support will be better. That also means it may be easier
to make this sort of attack quieter in the future, as non-ASCII rules
won't be definitively wrong as they are now.
The question is whether non-ascii malicious rules could do anything more
damaging than simply failing to
Or maybe they should just write a plain text body:
Hello there, !
This is a test template...
Rather than just a direct translation of the obfuscated HTML:
Clipxuck thDe button belo2bw avnd ewnd the confiIrmtion stTodeps
(2) where would I go to look at building a plugin for this? Ideally
something that ends up upstream, but though I can write code, I know no
perl :).
Well, from the few I've seen, they all seem to have a relatively constant
structure. Someone pointed you to a plugin that is at least dealing in
None of these seem to accomplish disabling learning for a specific rule
I think the problem is that I believe Bayes works off of the total score,
and probably only sees rule names as more tokens, if it sees them at all. If
it indeed works off the total score, about all you can do is somehow
I found this little wonder in a bunch of spams I've been getting for the
last few days:
http://; http://; http://; http://; http://; http://;
href="http:/mi.wey.vandalized655bccemetries.cleaning/id>">unsubscribe here
I have no idea if that actually works, since I'm not about to try it.
The originating PHP script header helps people who run shared servers
track down the source of problematic mail. The two most common cases are:
Does this look valid?
X-PHP-Originating-Script: 48:class.phpmailer.php
Just looking at a dozen or so of the smpams I've gotten in the last couple
I'm getting a lot of mails with some very curious headers in them.
I tried searching with Google, and it has never heard of many of these
strings.
Does anyone recognize what might be generating these headers?
X-EOPTenantAttributedMessage
X-EmailAdvisor
X-Mxtb-Transitionid
X-MG-Subscriptionuid
body NOT_INTERESTED=~ /“[Nn]ot\S{1,5}[Ii]nterested\.?â€/
Might also be an interesting test. I assume the gibberish on the front and
back is quotes in some character set or another, but they seem a little
unlikely in a real mail.
Loren
---
This email has been checked for
And yet another rather amusing one from a crypto trading scam:
The BTC wallet which you have to send is:
1GF1DcYFpe MoA4Ttj6TeWPK sJFRV43JjYc (PLEASE REMOVE THE SPACES FROM THE WA=
LLET NUMBER)
Our trading system will automatically recognize your investment and start =
making profits for YOU!
Thanks Regards,
Billing Team
Defender Firewall Protection
+1, 888, 313, 1366
Perhaps memory fails, but was there not, once, a standard rule that
detected non alpha characters in
sender name? The domain/provider is not of interest for this question.
I think there was, but I suspect that the spam/ham ratio would be about
even, which is probably why it doesn't show up
From a fake "subscription" spam:
You can reach out
to our Customer Support Team+1 (800) 781 - 2511.
usly bogus that it avoids filter rules. :)
- Mark
On 6/17/2021 10:52 AM, users-digest-h...@spamassassin.apache.org wrote:
Subject: Re: Maybe it's time to revive EvilNumbers?
From: "Loren Wilton"
Date: 6/16/2021, 8:18 PM
To:
Here ar
Here are a handful of rules that work for me. Feel free to try them.
If you do, please let me know how they work for you.
(Apologies for my mail client trashing the formatting.
Be sure to check for possible line wrap on some of the rules!)
Loren
body LW_PAYMENT
My site is getting a lot of spam that is getting past spamassassin.
Because it has a hone number to call, and rather than a link to login
using username and password. Mostly fake amazon purchases. They are
getting past a lot of URL block lists because of that. FWIW. - Mark
I have a
You could try
headerX_SWITCHALL=~ /^X-\$switch\b/sm
Loren
so you don't have points from body rules.
your mentioned URI_DEOBFU_INSTR is a meta rule:
meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST
so maybe it's not considered.
They are treated as header, or ignored if marked as net.
I think a bug report should be submitted for this.
I think the OP was trying to find a way to match "To: " to
"Hi user".
Loren
I'm trying to use the latest ExtractText plugin, but the docx2txt
program the plugin references is no longer available from
http://docx2txt.sourceforge.net
The latest version appears to be 1.4 from several years ago.
I just tried downloading the 1.4 version and the CVS version, and in both
.pro have a -1 with SUSP_URI_NTLD_PRO.
Is that really minus 1? Negative scores are good, they counteract spammy
scores, which are positive.
Loren
I could add another point between BAYES_999 and BAYES_99 scores but that
seems reactionary. Is there a better way? Should I thrown in another point
for certain keywords in marketing emails like these?
For this specific message I might be inclined to add a rule to check for a
URL in the
While I haven't received a forged Amazon order email in this exact form,
there is all kinds of stuff here that could be caught with appropriate
rules.
"In-case you require any
change in order or like to cancel we recommend giving us call
immediately at "
"In-case" is unlikely in
Examples: https://pastebin.com/pF6Nmquc
Well, I can see a couple of simple rules that would catch these two, but I
don't know if they would also trip on legit mail.
List-Unsubscribe: m'http://180e977\.olink1\.xyz'
X-Mailer-SID: m'\b180e977_18\b'
3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
[score: 1.]
0.5 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
I have
5.0 BAYES_99 BODY: Bayes spam
We would need to see the original headers from the spam, or ideally the whole
spam before we could say anything. It would also be helpful to see the rules it
hit on your system.
Loren
I would not be so broad with that. I have 49 messages in my personal
archives with X-MC-User headers, none of which I have classified as spam.
Bill, do you see multiple X-MC- headers in the mails that come thru
MailChimp?
As in, "multiple many" or "multiple 2 or 3"? Or just the Users header?
Ah, OK. Looking at the MailChimp page, it appears that these headers appear on
a message being sent to MC, and then it extracts them, most likely removes them
from the final generated email, and uses them as processing instructions on how
to generate the email or sequence of emails. In any case
I've started seeing a number of spams with the following block of X headers
in it. I've never seen these before. While these look really fake to me
(from the content of most of them), does any real tool or site make headers
like this, or are they just from some spam tool and I can use them as a
In order to bring the SenderScore/ReturnPath DNS reputation and blocklist
rules up-to-date with their current ownership and administration, the
rules are being renamed:
RCVD_IN_RP_CERTIFIED -> RCVD_IN_VALIDITY_CERTIFIED
RCVD_IN_RP_SAFE -> RCVD_IN_VALIDITY_SAFE
RCVD_IN_RP_RNBL ->
I just got this little wonder, and was surprised that it got thru as ham.
From: "PayPal Billing"
I've fixed that locally, but I'd think SA ought to have a rule for "PayPal"
that doesn't come from paypal.
I just got what appears to be a legit email from my ISP.
It has a tracker tag pointing to 102.122.207.net.
Note that is a site name and not a dotquad.
Somehow this doesn't make me real comfortable with the possible veracity of
the email.
Has anyone come across 102.122.2O7.net before?
why is date important ?, spamassassin do test it already
DATE_IN_PAST *
Well, the date is a spam sign. That is good enough for me to be important.
And the DATE_IN_PAST * rules don't hit these spams.
Loren
and if you want to become an hero patches to document those evals are
always
welcome ;-)
Well, if I use undocumented code I have to figure out, I always do my own
documentation, since my memory these days is about five minutes long. The
trick for me will be figuring out how I could submit
I'm getting a lot of spams that all have a series of completely bogus
Received headers in them. A characteristic of these headers is a rather
improbable datestamp, considering today's date:
Received: from 69-171-232-143.mail-mail.facebook.com ([69.171.232.143])
by
Has anyone been getting spams from "ViraLife"? They have slowly started, one
by one, hitting all of my email inboxes. It shows up about once a week as a
"newletter". It claims to be from a legit email hosting company I know
nothing about, and I certainly have never signed up for this spam.
Right, but __STY_INVIS is currently tag-blind (it only looks for the
style="" clause), so it hits that, and if lots of ham is hiding tracking
images that way that might explain the poor S/O.
I suspect that might be the case.
The vast majority of invisible garbage I see is hidden in a ...
On 16 Dec 2020, at 23:21, Loren Wilton wrote:
I just got a batch of spams containing
Such rules are there. Unfortunately, for whatever reason, lots of ham
uses "invisible" text so it's not useful as a spam sign by itself and
it's hard to come up with any useful combination
I just got a batch of spams containing
That was followed by about 2K bytes of garbage containing GUIDs and links to
putatively some youtube video. The span was then terminated correctly, the
body of the spam, and then the same garbage for about another 2KB.
The small font rules didn't seem
That probably should have hit at least one scored base rule:
https://ruleqa.spamassassin.org/?rule=%2FFROM_2_
Nope. I think my rules are up to date, but maybe not.
Feel free to pastebin it and I'll take a look.
That probably should have hit at least one scored base rule:
https://ruleqa.spamassassin.org/?rule=%2FFROM_2_
Nope. I think my rules are up to date, but maybe not.
I just received a spam with this interesting From address:
From: "VA Rate Guide"
I wonder if it is worth checking for mail from more than one sender at once?
Loren
I don't have a Faceboox account and don't know anyone on Facebook that would
send me mail (and don't want to!), so I have absolutely no idea if these
headers from recent spams are completely made up out of the air (and thus
spam signs) or are valid headers.
Can anyone tell me if this stuff is
Keep in mind that freedon of speech says that you can stand in the park on a
soapbox and shout.
It does NOT say that passers-by are forced to stand there and listen to you
until you run out of voice. They can walk away any time they want to.
It also does not say that the local newspaper is
You may also want to stick optional whitespace in there to avoid trivial
bypass:
There's also the possibility of adding a typeface or other options to the
tag, which would bypass your simple rule. And HTML is not
case-sensitive. And avoid * on complex stuff when matching arbitrarily
long
See rawbody_part_scan is the docs.
Also the chunking of the rawbody into 2-4 kB blocks, may make a
difference.
I wasn't able to find rawbody_part_scan in any of the docs that I managed to
find, but after digging into the source I found the chunking logic and dug
out the 2K limit. I'm not
basics of escaping at least *anything* won't do any harm
php > echo preg_quote('[^<]*<');
\\[\^\<\]\*\<
Well, escaping the [^<]* part certianly will do harm, since it will turn it
from a group match into individual characters that don't exist in the text
to be matched.
But I've tried
I'm getting lots of spams that are about 100+K long. The spam body contains
two blocks of random news text copied from fox news or msnbc or the like,
enclosed in a zero-point font block. I'm trying to match this simple pattern
to give some extra points, but I can't seem to get it to work. I'm
> Can you please tell me how to generate that report?
I believe he is asking for the results of something like
spamassassin -t
https://krebsonsecurity.com/2020/08/sendgrid-under-siege-from-hacked-accounts/
also sheds light on the issue too.
. SendGrid knows (or should konw) that it has compromised accounts.
It could find out what some of them are for free by downloading Rob's list
of 25 or so compromised accounts. It
1 - 100 of 1225 matches
Mail list logo