Hi there, On Wed, 7 Apr 2021, Micah Snyder (micasnyd) via clamav-users wrote:
There’s a lot of technical work to be done to safely raise that limitation, as large files of various file types types have never been tested.
In my milter I've a pretty general-purpose Perl harness which can send data to clamd in flexible ways. It wouldn't take much effort to tweak it to run tests on clamd - in fact I've used it for that kind of thing in the past. If you'd like me to do some testing with large files and especially if you have some candidate large files which would be worth trying, I'd be happy to set a job running on an otherwise idle machine and cook rice puddings while waiting on the results. I have machines which I can cheerfully crash without worries. They're Pi4Bs, which if you leave them running for long enough will crash all by themselves.
A large TAR, for example, may well work fine when a large ZIP might crash the program. We really have no idea.
Do you have anything fuzzing the code, deliberately trying to break it, any even semi-automatic analysis? Seems like if you could break things into manageable blocks the community could help quite a bit. What would help most is a design document explaining the structure of the code, how it all hangs together, and the intended function of the various parts. Then people who would otherwise be overwhelmed by it all could get their teeth into it. It could pay enormous dividends if something like that were available to the community. Help in testing would be just the start.
A lot of folks seem to be unhappy with it saying “OK” when a file hasn’t been scanned (myself included). So we have been talking about changing the output to something like the following messages when files are not scanned or are only partially scanned: * “SKIPPED (exceeded max file size)” * “INCOMPLETE (exceeded max scan size)” The exact wording is TBD. If anyone has any specific requests, I’d enjoy some help brainstorming.
Agreed it's perverse to report "OK" if a file was not properly scanned but since it's been that way for decades I think you'll probably break an awful lot of stuff Out There if you just go ahead and change that. A compile-time option, initially defaulting to the current behaviour, or a configuration option (the default behaviour as now) might prevent a lot of angst. No issues with the suggested wordings that I can see, as long as they don't turn out to be a moving target. There should be another one, perhaps something like "DUNNO", for things nobody thought of yet possibly including "SKIPPED (below minimum file size)". Please also something in the docs reserving the right to add new replies, so that coders get the habit of coding for the future or so at the, er, barest minimum your @r$e is covered.
... Some file formats, like PDF, DMG, and ZIP* store metadata at the end of the file ... zips are actually pretty easy to parse in-order ... Files like DMG, on the other hand, can’t even be identified as DMG’s without reading the end of the file first ...
Is there somewhere a document listing the file types of which ClamAV is aware, how it parses them, and any specific limitations/issues? Whenever I've delved into the code it's been pretty daunting to try to work out some of that stuff.
In short, don’t send chunks of files as separate files to be scanned; It probably won’t catch any malware that way and may print lots of warnings or errors if it gets confused about the type of the file and starts processing it with the wrong parser.
I think the OP was confused by the use of 'chunks' in the clamd 'man' page, which refers to the API for streaming data to clamd rather than any suggestion that files can be broken into parts which will then be scanned separately. Clearly I can scan any known malicious file four bytes at a time to guarantee a clean result. -- 73, Ged. _______________________________________________ clamav-users mailing list [email protected] https://lists.clamav.net/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml
