Paul,

I investigated further and realize now that it ISN'T double-extracting files 
from plain zips.  It is double-extracting files from zips within other raw 
image file formats, like TAR or image file formats.  For a plain zip, It 
detects the file entries twice, but doesn't extract them if the parent file is 
a zip. 

I tested this by making a simple zip with two text files in it, then tar.gz'd 
it.  Scanning the zip.tar.gz file resulted in double-extraction of both text 
files.  

Funny story, the omni.ja file is not a real zip.  The author of the format 
decided to place the central directory header at the beginning of the file 
instead of at the end, resulting in a new zip-like file format.  We're able to 
parse out the files from omni.ja okay because we have self-extracting zip 
signatures that identify the individual file entries and because the omni.ja 
file itself is detected as "binary data" (so the ZIPSFX-in-a-ZIP exclusion rule 
does not apply). 

Anyhow, I now suspect that the omni.ja file in a tar.gz file will also get 
double-extracted.  The simplest option would be to disable 
file-type-recognition scans for embedded files file formats in TAR files (and 
also GPT and other non-compressed archive file formats).  I had been wanting to 
do this anyways after investigating a closely related issue regarding ISO/GPT 
file formats. This definitely gives us more reason to do so.

-Micah

On 4/10/20, 6:55 PM, "Paul Kosinski" <[email protected]> wrote:

    Is this a generic problem with compressed archives (like the Firefox
    ".tar.bz2") or is it zip specific? 
    
    If it is zip specific, there are 2 files in the Firefox distribution
    file that are zip format compressed which might explain the slowness.
    (They are both named omni.ja, but have different contents).
    
    
    
    On Fri, 10 Apr 2020 19:58:35 +0000
    "Micah Snyder (micasnyd)" <[email protected]> wrote:
    
    > One issue ClamAV currently has with scanning Zip archives is that
    > ClamAV's self-extracting zip detection logic has a flaw wherein it
    > detects every file within a zip as a new self-extracting zip.  As a
    > result, I believe (and I could be wrong on this), that Clam ends up
    > extracting and scanning every file in a zip *twice*.  I'm still
    > brainstorming the best way to fix this -- but I suspect this is a
    > large part of why zip-based file formats take much longer than
    > expected to scan. 
    > 
    > -Micah
    > 
    > 
    > Micah Snyder
    > ClamAV Development
    > Talos
    > Cisco Systems, Inc.
    >  
    > 
    > 
    > 
    > On 4/7/20, 1:38 PM, "clamav-users on behalf of Paul Kosinski via
    > clamav-users" <[email protected] on behalf of
    > [email protected]> wrote:
    > 
    >     I didn't want to screw around with my clamdscan (clamd.conf)
    > settings, so I ran my optioned-up clamscan command on a smaller and
    > much less complicated file. It took less than 11 seconds total time.
    > (My previous guess on clamscan's DB load time was apparently way off.)
    >     
    >     This suggests that the ClamAV scanning process really does take a
    > lot of CPU to deal with a big, complicated file like a Firefox
    > package: 
    >       time clamscan
    >            --alert-exceeds-max=yes --max-scantime=999999
    > --max-scansize=4090M --max-filesize=4090M --max-files=30000
    > --max-recursion=30 --pcre-match-limit=999999999
    > --pcre-max-filesize=999999999 audiofile.wav 
    >       audiofile.wav: OK
    >     
    >       ----------- SCAN SUMMARY -----------
    >       Known viruses: 6804144
    >       Engine version: 0.102.1
    >       Scanned directories: 0
    >       Scanned files: 1
    >       Infected files: 0
    >       Data scanned: 1.74 MB
    >       Data read: 1.73 MB (ratio 1.01:1)
    >       Time: 10.836 sec (0 m 10 s)
    >     
    >       real    0m10.851s
    >       user    0m10.439s
    >       sys     0m0.412s
    >     
    >     P.S. This is an actual audio intermediate file, not just random
    > bytes. 
    >     
    >     
    >     On Mon, 6 Apr 2020 21:50:15 -0700
    >     Al Varnell via clamav-users <[email protected]> wrote:
    >     
    >     > Much of that time is almost certainly being consumed by loading
    >     > the signature database into RAM. How long does it take using
    >     > clamdscan?
    >     > 
    >     > Sent from my iPad
    >     > 
    >     > -Al-
    >     > 
    >     > On Apr 6, 2020, at 12:29, Paul Kosinski via clamav-users
    >     > <[email protected]> wrote:  
    >     > > 
    >     > > It *does* take more than 120 secs for the clamscan command to
    >     > > fully scan the 62 MB Firefox installation file (.tar.bz2).
    >     > > Trying the scan with the default clamscan limits results in
    >     > > 62 MB "Data read" but *zero* "Data scanned"!    
    
    > 
    


_______________________________________________

clamav-users mailing list
[email protected]
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to