On Wed, 6 Jan 2021, Bill Cole wrote:

John,

1st: thank you very much for working on generated rules and for all the rest of 
your work on rules.

I am curious about whether these very long regexes have been proven to actually work in full, or if it is possible that they are getting mishandled quietly. I don't see any hits on either rule on any of the mail systems I work with going back a month, so I am wondering if it is worthwhile to construct test messages that should hit due to elements in the latter parts of the patterns or if you've already done such tests.

I did some quick searching to see if there's any documented max RE size and couldn't find any such. I'd wager that's based on available resources rather than being a hard size limit.

If there is a hard length limit on REs, and SA silently breaks when that limit is exceeded, it doesn't appear to have hit it yet:

Jan  6 09:28:07.206 [30594] dbg: rules: ran header rule __REPTO_419_FRAUD_GM_LOOSE 
======> got hit: "[email protected]"
Jan  6 09:28:07.207 [30594] dbg: rules: ran header rule REPTO_419_FRAUD_GM ======> got 
hit: "[email protected]"

Jan  6 09:28:40.398 [30728] dbg: rules: ran header rule REPTO_419_FRAUD ======> got hit: 
"[email protected]"


But in retesting this I did find and fix a minor RE error that caused it to miss addresses in yahoo.com.XX

Thanks for asking!


There's no guarantee you'll see hits. The only feed I have for this is 419 spams sent to me and my wife, and a few 419 spamples that others have provided, so the sample set is probably rather small even though it feels like I get a metric buttload of such garbage.

If anybody has a well-vetted 419 scam corpus that they'd be willing to extract reply-to addresses from to contribute to this, feel free to contact me privately.


The reason I decided to do this is that I'm still getting 419 pitches having gmail contact addresses that I started reporting *more than six months ago* (and continue to report every time I get another one). I would assume that if Google had actually suspended the accounts after (multiple) reports then the 419 scammers would have stopped using those contact addresses in their pitches because they couldn't receive replies, thus it looks to me like google just doesn't give a shit about my 419 collector mailbox reports. However, I recognize that assumption may be flawed, and anyone with actual contacts inside the gmail administrative team is invited to email me privately to discuss this.

I figure if I'm still seeing them then others are probably seeing them too and would benefit from them being scored in addition to the body-based 419 scam tests.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]                         pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Je ne suis pas Charlie. Je suis armé.
-----------------------------------------------------------------------
 Tomorrow: the 6th anniversary of the Charlie Hebdo massacre

Reply via email to