Hi ClamAV team and users,

This is a follow up to my previous posts, which can be found 
here<https://lists.clamav.net/pipermail/clamav-users/2024-February/013744.html> 
& 
here<https://lists.clamav.net/pipermail/clamav-users/2024-February/013744.html>.
 I wanted to give a summary and make sure the problem identified is clear.


My team and I have noticed that ClamAV can be very slow in scanning certain PDF 
files. When we investigated the matter, we discovered the potential root cause 
within ClamAV source code. In 
https://github.com/Cisco-Talos/clamav/blob/5f934c16b47591157a7082b71e751c45f095e2c8/libclamav/pdf.c#L1984,
 ClamAV handles PDF document tags. This function comes with a state to properly 
handle tags that require parameters. However, this state is not reset after 
parameters are parsed, so parsing is sensitive to the order in which tags are 
listed in the dictionary.



For example, this collection of headers for a PDF will scan fast because image 
subtype is before all filters:



```

429 0 obj << /ColorSpace /DeviceRGB /Name /im56 /Height 2850 /Subtype /Image 
/Filter /FlateDecode /DecodeParms << /Columns 1776 /Colors 3 /Predictor 2 >> 
/Type /XObject /Width 1776 /Length 25686 /BitsPerComponent 8 /Interpolate true 
>> stream
```

However, this collection of headers for a PDF will scan slow because image 
subtype comes after filter (image will be dumped, though it should not be):

```
454 0 obj<</Length 455 0 R/Filter/FlateDecode/DecodeParms<</Columns 
1776/Predictor 2/Colors 3>>/Width 1776/Height 2850/BitsPerComponent 
8/ColorSpace/DeviceRGB/Interpolate 
true/Type/XObject/Name/im56/Subtype/Image>>stream
```


Finally, in this line: 
https://github.com/Cisco-Talos/clamav/blob/5f934c16b47591157a7082b71e751c45f095e2c8/libclamav/pdf.c#L1580,
 we see references to parameters, but they are used after tags are parsed. And 
neither DP nor DecodeParms are in `pdfname_actions`, so they are not affecting 
state.



Slow PDF scanning has been a known problem for 3 years, and it would be nice to 
see it addressed in a new patch soon.



Again, I’m happy to provide more details if needed. Thank you for your time.



Best,

Eric



________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain 
confidential information of Five9 and/or its affiliated entities. Access by the 
intended recipient only is authorized. Any liability arising from any party 
acting, or refraining from acting, on any information contained in this e-mail 
is hereby excluded. If you are not the intended recipient, please notify the 
sender immediately, destroy the original transmission and its attachments and 
do not disclose the contents to any other person, use it for any purpose, or 
store or copy the information in any medium. Copyright in this e-mail and any 
attachments belongs to Five9 and/or its affiliated entities.
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

Reply via email to