Hi there, On Fri, 14 Jan 2022, Andreas Wittig wrote:
I'd like to know, whether ClamAV scans attachments embedded in .msg files.
ClamAV can and will scan anything. It is a Unix-style application, by which I mean it is coded in ways which are unlike many (most?) Windows applications where the name of the file actually matters. ClamAV does not use the file name to determine what kind of file it's working with so you can't fool it by using the wrong filename extension (as you can probably still do on Windows:). In case you're wondering, ClamAV will *still* behave as a Linux-style application when it runs on Windows, because it's more or less the same code which is built for any OS. Whether ClamAV will find what you're looking for is another question. The specifications for the Microsoft proprietary .msg format have been revised a few times during its life. See for example https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxmsg/b046868c-9fbf-41ae-9ffb-8de2bd4eec82 which lists the revisions. The documents there indicate that at least some of the file processing needed to utilize .msg files in the ways intended by Microsoft may be subject to one or more Microsoft patents. The patents won't affect scanning such files using signatures but may possibly affect things like examining and unpacking the file contents. It looks like it might be a huge undertaking to make a utility capable of handling all possible versions of an arbitrary .msg file, but there seem to be people attempting it in the open source world. Here's one example which seems to be actively developed: https://github.com/TeamMsgExtractor/msg-extractor Having said that, ClamAV has several different unpacking/uncompressing tools built into it, so it doesn't (usually) need to rely on foreign implementations of the archiving tools. When necessary it can extract files from archives before scanning (although see below about internal limits, of which there are quite a few - but they're documented). The limits are for example for protection against excessive resource usage which may under some circumstances be as much a threat as anything which might be scanned. ClamAV can extract 7z archives, and 7-Zip can apparently extract attachments from .msg files so you might get lucky. There can be legal issues. The position can get murky when files are archived using some patented tool and need to be extracted. The 'rar' archive type is one such case which you can read about. If ClamAV doesn't do all the unpacking that's needed then you might need to use something like msg-extractor mentioned above. Personally my take on it is that if in order to *use* something I will need a licence from Microsoft, then whatever it is I don't want it - but that doesn't mean I can't write signatures which ClamAV will use to match malicious (or other) content within it and scan it for threats, and it doesn't mean that I can't or won't write a little script to call from a milter to unpack something which is causing difficulties for ClamAV's unpacker. The signatures which ClamAV uses not only determine what the signature is looking for, they also determine where they will look. So assuming the data doesn't cause some internal ClamAV limit to be exceeded, then if there is a signature which (a) is designed to scan the kind of file or data stream that you're working with and (b) matches something in (*anything* in) the data, then ClamAV will report that it's found it. Perhaps what you're really asking is "Does ClamAV have signatures for things in .msg files?" The answer is "It doesn't matter what things are in, it only matters if there are signatures which will match it." In some cases the match needs to be as general as "This is a kind of file which I can't unpack, so I'm flagging it as suspicious." which is quite possibly, for some file archive formats, all you can legally do without a licence. And, of course, if the attachment or whatever was encrypted before attaching it to the file, then with limited resources that is probably about all you can do anyway.
I could not find an answer to this question in the documentation or FAQ.
Which documentation? Did you try the search function at docs.clamav.net? See for example https://docs.clamav.net/appendix/FileTypes.html?highlight=file%20types#file-types which begins with an explanation that any signature of Target Type 0 will be run against *all* files.
Also, I'd be interested to learn where to find that information in the source code to avoid future questions.
Questions are good. Mostly. :) For most questions of this kind there is no single place where you would need to look. The source code is a bit difficult to understand if you aren't skilled with C. It is (has to be) a bit general purpose and I'm afraid it's rather sparsely commented. There's a library of utilities called libclamav, and most of the tools use this library to do their jobs. To get a feel for how it all hangs together, look at clamav-0.104.x/clamscan/clamscan.c which is a short and easily understood application which calls the function 'scanmanager' in manager.c which in turn sets some options and uses calls to functions like 'scanstdin' which is relatively easy to follow to do most of the work - it calls a ClamAV library function 'cl_scanfile_callback' found in .../libclamav/scanners.c, and beyond that you go ever deeper into the rabbit-hole... The documentation also mentions that there's a way for the ClamAV libraries to show you what it did during the scan, by leaving the temporary files which it creates during the scan on disc instead of deleting them as it would normally do. See the 'clamscan' man page, the '--leave-temps' option which might help you to answer questions you have about what it has done during a scan. You might find the '--gen-json' option useful as well. Here are some articles I came across while looking for help for you: https://isc.sans.edu/forums/diary/Peeking+into+msg+files/22926/ https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/down-the-rabbit-hole-extracting-maliciousness-from-msg-files-without-outlook/ These show that it's fairly easy to dismantle these files - at least it was five years ago - and so that it should be fairly easy to write signatures for things in them, with the proviso that nesting of the container formats is (a) perfectly possible with the .msg format and (b) very commonly used by malware authors to hide their activities. Probably the main take-home point for me to make is that (with some exceptions) if ClamAV does *not* have a signature for any particular threat, then no matter *what* kind of file contains it, ClamAV will generally not alert you to the threat. See other posts of mine in the archives for estimates of probabilities from observations made of our mail traffic over quite a number of years. Nowadays the vast majority of ClamAV detections here are made by my own signatures (Yara rules) with most of the remainder being made by third-party signatures from a few very useful sources such as Sansecurity. I should add that we're a non-typical ClamAV user, in that we're not especially interested in finding viruses and we don't use Windows for anything; ClamAV here is mainly a spam detection tool, although of course it does occasionally flag examples of malware. Much of our protection is implemented by looking at the information we have about a message (such as where it came from) and not at the message content itself. If you have examples of threats which ClamAV does not detect you might want to submit them to Sourcefire for appraisal. You can do that with the Web interface at https://www.clamav.net/reports/malware or you can use a tool called 'clamsubmit' from the ClamAV suite. There's no such thing as a free lunch. :/ HTH -- 73, Ged. _______________________________________________ clamav-users mailing list [email protected] https://lists.clamav.net/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml
