Not a ClamAV developer, but I would expect two key reasons to be logical
signatures and hash signatures.
Logical signatures may have an "anchor" pattern that appears late in the
file, with other patterns found throughout earlier parts of the file
(this will be down to the specific signature). Specific Yara rules may
also have similar constraints.
Hash signatures can't reasonably be compared until the complete file is
buffered by clamd.
Also, some other signatures may be location-anchored to the end of the file.
I don't think that scanning any given individual file has any
parallelization either; you get that from being able to open multiple
connections to scan multiple files at the same time. And progressive
scanning doesn't seem to me like something that would give much benefit;
the actual scan time is pretty quick. It's parsing signatures into
their in-memory structures for scanning that takes all the time.
-kgd
Akshit Jain via clamav-users wrote:
Hi Team,
This is regarding the issue discussed here:
https://github.com/Cisco-Talos/clamav/issues/1424.
I have observed similar behavior with the *INSTREAM* command — the data
is not scanned incrementally as it is streamed to the socket. Instead,
the scan only begins after all bytes have been written to the socket.
This makes it functionally similar to a normal *SCAN* operation, and
therefore, it doesn’t provide the expected advantage of parallel or
progressive scanning.
I’d like to use this platform to better understand the *intended
behavior and limitations*:
*a.* Why doesn’t ClamAV support incremental scanning during streaming?
Are there architectural or technical constraints that prevent
implementing this functionality?
*b.* Could we also consider updating the official documentation to
explicitly clarify the behavior of *INSTREAM*? Many users assume that it
performs scanning while data is being streamed, which seems to be a
common misunderstanding.
For example, the documentation could include a note such as:
/“The INSTREAM command does not perform incremental scanning. File
data is fully buffered before scanning begins. INSTREAM is intended
for transferring files to ClamD for scanning, not for real-time or
progressive scanning of streamed data.”/
This clarification would help set the right expectations for developers
and users integrating with ClamD.
Thanks and regards,
Akshit Jain
_______________________________________________
Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users
Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation
https://docs.clamav.net/#mailing-lists-and-chat
_______________________________________________
Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users
Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation
https://docs.clamav.net/#mailing-lists-and-chat