[
https://issues.apache.org/jira/browse/PDFBOX-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745924#comment-17745924
]
Andreas Lehmkühler edited comment on PDFBOX-3712 at 7/22/23 1:36 PM:
---------------------------------------------------------------------
I've replaced the usage of ByteArrayOutputStream/ByteArrayInputStream with the
new RandomAccessReadWriteBuffer.
PDFBox now supports decoded streams with more than 2GB as it uses chunks with a
default size of 4kb. The rw-buffer is used as out- and input so that it is no
longer necessary to copy the data to an intermediate byte array. In the end the
memory foot print is reduced one more time.
Additionally the chunksize is adjusted according to the estimated stream size
so that we don't waste to much memory if a pdf contains lots of small streams
such as the example from PDFBOX-5530
was (Author: lehmi):
I've replaced the usage of ByteArrayOutputStream/ByteArrayInputStream with the
new RandomAccessReadWriteBuffer. PDFBox now supports decoded streams with more
than 2GB as it uses chunks with a default size of 4kb. The rw-buffer is used as
out- and input so that it is no longer necessary to copy the data to an
intermediate byte array. In the end the memory foot print is reduced one more
time. Additionally the chunksize is adjusted according to the estimated stream
size so that we don't waste to much memory if a pdf contains losts of small
streams such as the example from PDFBOX-5530
> PDFBox goes into an infinite loop with this PDF
> -----------------------------------------------
>
> Key: PDFBOX-3712
> URL: https://issues.apache.org/jira/browse/PDFBOX-3712
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.4
> Reporter: Dirk Groeneveld
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: PDFBOX-3712-page6-rendered.png
>
>
> The PDF at
> https://pdfs.semanticscholar.org/2095/e3df01fc32e0bff982a1e79600d5bcf10b81.pdf
> puts PDFBox into an infinite loop.
> This is roughly my code:
> {quote}
> final PDDocument pdDoc = PDDocument.load(inputStream);
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.getText(pdDoc);
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]