[ 
https://issues.apache.org/jira/browse/PDFBOX-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745924#comment-17745924
 ] 

Andreas Lehmkühler commented on PDFBOX-3712:
--------------------------------------------

I've replaced the usage of ByteArrayOutputStream/ByteArrayInputStream with the 
new RandomAccessReadWriteBuffer. PDFBox now supports decoded streams with more 
than 2GB as it uses chunks with a default size of 4kb. The rw-buffer is used as 
out- and input so that it is no longer necessary to copy the data to an 
intermediate byte array. In the end the memory foot print is reduced one more 
time. Additionally the chunksize is adjusted according to the estimated stream 
size so that we don't waste to much memory if a pdf contains losts of small 
streams such as the example from PDFBOX-5530


> PDFBox goes into an infinite loop with this PDF
> -----------------------------------------------
>
>                 Key: PDFBOX-3712
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3712
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.4
>            Reporter: Dirk Groeneveld
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: PDFBOX-3712-page6-rendered.png
>
>
> The PDF at 
> https://pdfs.semanticscholar.org/2095/e3df01fc32e0bff982a1e79600d5bcf10b81.pdf
>  puts PDFBox into an infinite loop.
> This is roughly my code:
> {quote}
> final PDDocument pdDoc = PDDocument.load(inputStream);
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.getText(pdDoc);
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to