[ 
https://issues.apache.org/jira/browse/PDFBOX-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624344#comment-17624344
 ] 

Michael Klink commented on PDFBOX-5530:
---------------------------------------

{quote}Parsing such files seems to be an attack{quote}
I doubt it's an attack. In particular I doubt it's an attack to prevent 
_arbitrary loading_ by causing out-of-memory situations.

I think it's more likely that the creator of this document attempted to prevent 
_text and bitmap extraction_. Text extraction is made difficult by drawing the 
characters using vector graphics paths instead of using fonts with the side 
effect of gigantic content streams. And bitmap extraction is made difficult by 
partitioning the bitmaps (of official looking stamps) into thousands of mini 
parts, resulting in the thousands and thousands of tiny bitmap images.

> Java heap space
> ---------------
>
>                 Key: PDFBOX-5530
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5530
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.25
>            Reporter: liu
>            Priority: Blocker
>         Attachments: image-2022-10-20-14-30-19-790.png, 
> image-2022-10-20-14-30-57-332.png, image-2022-10-20-14-32-10-258.png, 
> image-2022-10-20-15-01-06-688.png, image-2022-10-20-19-07-42-632.png, 
> image-2022-10-20-19-08-23-932.png, screenshot-1.png, 引起宕机-1.pdf, 引起宕机.pdf
>
>
> code(only this part of the code):
> PDDocument load = PDDocument.load(file, 
> MemoryUsageSetting.setupTempFileOnly(-1);
>  
> hi. Why do I configure it like this, it still takes up so much memory? What 
> is the effect of using setupTempFileOnly. 
> !image-2022-10-20-14-30-19-790.png!
> !image-2022-10-20-14-30-57-332.png!
> !image-2022-10-20-14-32-10-258.png!
> [^引起宕机.pdf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to