Re: [clamav-users] Scanning PDF for phishing links

G.W. Haywood via clamav-users Tue, 29 Jun 2021 12:22:45 -0700

Hi there,

On Tue, 29 Jun 2021, Scott Q. via clamav-users wrote:

Lately I am receiving a lot of Spams originating within MS networks


I feel your pain.  At present I'm seeing 40,000 to 50,000 attempts per
month by Microsoft servers to send us spam.  It's gone from really bad
to almost unbelievable in the space of just a few weeks.  When it was
only a thousand or so I decided we'd live with it, but now the only
answer has been to blacklist AS8075 entirely and forward it all to the
spam reporting services.  I'm starting to see some results from that.
Having said that I'm not seeing the same sorts of thing that you are,
if you'd like to send me a sample privately I'll happily look at it.

with attached PDF's that basically contain an image with a link.

The body of the message is 7-8 random words such as: moka bu fyno da
zosi ku xiqy zy

These prove particularly difficult to filter and I'm thinking maybe
running the PDF's links through the phishing checks might help.


Is that possible or does anyone have other solutions for these
messages ?


Steve at Sansecurity might be able to come up with something if you
submit a few samples to him.

For things like this I don't rely entirely on ClamAV and signatures,
but on a milter which dismantles the MIME parts and passes them to
clamd separately with a bit of extra logic.  Without something like
that you'll probably need to do a bit more work on the matching, as
you'll have to work with the whole message body and it might be big.

It should be possible to match the body with Yara rules, you might get
somewhere with a fairly simple regex along the lines of matching the
header parts enclosing the short text with one expression and the text
itself with another expression.  This is just a guess at the sort of
thing which might work, adjust the character ranges to suit the spam.
Just put this in a file called something.yar in the ClamAV database
directory and restart clamd (I'm assuming you're using clamav-milter
and clamd).

rule Microsoft_spam
{
strings:
        $body_1 = /content-type.{10,500}content-type.{10,100}application\/pdf/  
nocase ascii
        $body_2 = /content-type: text\/plain.{20,70}(([a-z]{1,6})\s){6,8}/      
nocase ascii
conditions:
        all of them
}

The first regex matches the bit of the MIME-formatted message which
contains header of the first part, the first body part, and just the
header of the second part.  I've assumed that the text precedes the
PDF part, it's usually that way but you'd have to tweak it if that's
not the case.  The second regex matches the first header (again) and
something resembling 6 to 8 space-separated words of 1-6 alphabetic
characters.  There are 20-70 characters of wiggle-room betweeb the
content-type field and this group of words to allow for the rest of
the first header after the content-type field.  Again it might be
necessary to adjust that, but you'll probably find that the messages
aren't very creative and once it's set up it will match all of the
little blighters.

You could do much the same sort of thing with ClamAV signatures but
for this kind of thing Yara rules are a lot more readable and much
easier to tweak when you're experimenting.  The one drawback at the
moment is that it's fairly easy to crash clamd with bad Yara rules.
On the bright side it seems OK with complex regexes and it's unlikely
that a crash would be exploitable, as it seems to crash as soon as it
tries to parse the bad rules rather than waiting until it comes across
a malicious bit of data.

It's important to avoid running into efficiency issues by having the
regexes attempt (and eventually fail) to match large chunks of what is
potentially a very large document many times over.  I don't know how
well the untested attempts above will achieve that.

HTH


--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Re: [clamav-users] Scanning PDF for phishing links

Reply via email to