[
https://issues.apache.org/jira/browse/PDFBOX-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624344#comment-17624344
]
Michael Klink commented on PDFBOX-5530:
---------------------------------------
{quote}Parsing such files seems to be an attack{quote}
I doubt it's an attack. In particular I doubt it's an attack to prevent
_arbitrary loading_ by causing out-of-memory situations.
I think it's more likely that the creator of this document attempted to prevent
_text and bitmap extraction_. Text extraction is made difficult by drawing the
characters using vector graphics paths instead of using fonts with the side
effect of gigantic content streams. And bitmap extraction is made difficult by
partitioning the bitmaps (of official looking stamps) into thousands of mini
parts, resulting in the thousands and thousands of tiny bitmap images.
> Java heap space
> ---------------
>
> Key: PDFBOX-5530
> URL: https://issues.apache.org/jira/browse/PDFBOX-5530
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.25
> Reporter: liu
> Priority: Blocker
> Attachments: image-2022-10-20-14-30-19-790.png,
> image-2022-10-20-14-30-57-332.png, image-2022-10-20-14-32-10-258.png,
> image-2022-10-20-15-01-06-688.png, image-2022-10-20-19-07-42-632.png,
> image-2022-10-20-19-08-23-932.png, screenshot-1.png, 引起宕机-1.pdf, 引起宕机.pdf
>
>
> code(only this part of the code):
> PDDocument load = PDDocument.load(file,
> MemoryUsageSetting.setupTempFileOnly(-1);
>
> hi. Why do I configure it like this, it still takes up so much memory? What
> is the effect of using setupTempFileOnly.
> !image-2022-10-20-14-30-19-790.png!
> !image-2022-10-20-14-30-57-332.png!
> !image-2022-10-20-14-32-10-258.png!
> [^引起宕机.pdf]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]