[jira] [Commented] (PDFBOX-5675) org.apache.pdfbox.filter.Filter#decode() Java heap space

Dieter von Holten (Jira) Wed, 24 Apr 2024 13:18:06 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840563#comment-17840563
 ]


Dieter von Holten commented on PDFBOX-5675:
-------------------------------------------

there is another problem with this file, which may be more or less connected to 
the problem you investigate.

On page 6 the file contains a stream of length 45.953.744, which is circa 45 
MByte, the major part of the file-size.

_This_ size itself should be no problem. The stream is FlateEncoded, that is 
compressed.

However, when i open the file in PdfDebugger, click on page 6, it get an 
exception {color:#172b4d}"Required array size too large" from
{color}

{color:#172b4d}java.util.InputStream.readNBytes(), line 417 (in jdk 17). It is 
called from InputStream.readAllBytes(  with Integer.MAX_VALUE ), which is 
called {color}

{color:#172b4d}from StreamPane.requestStreamText().{color}

{color:#172b4d}However, the internal buffer used in readNBytes() is 
Integer.MAX_VALUE-8. This method cannot read byte[] from streams larger
{color}

{color:#172b4d}that Integer.MAX_VALUE-8 (which usually is not a problem). The 
subclasses of InputStream seem to be able to handle larger streams,{color}

{color:#172b4d}but the call to InputStream.readNBytes() must be avoided. The 
subclasses are a little questionable in this respect, somehow they{color}

{color:#172b4d}know about 'long' positions and offsets, but in some places only 
'int' is used. Everything works well when the things are well smaller than 
2GB.{color}

{color:#172b4d}HTH{color}

 

> org.apache.pdfbox.filter.Filter#decode() Java heap space
> --------------------------------------------------------
>
>                 Key: PDFBOX-5675
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5675
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: liu
>            Priority: Major
>         Attachments: 2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf, 
> image-2023-09-05-15-05-50-168.png, image-2024-04-24-16-50-38-925.png, 
> image-2024-04-24-18-33-17-524.png, image-2024-04-24-18-35-43-792.png, 
> image-2024-04-24-19-25-22-904.png, image.png, screenshot-1.png, 
> screenshot-2.png
>
>
>  !image-2023-09-05-15-05-50-168.png! 
> When converting the sixth page of this PDF 
> file（2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf） to an image, a memory 
> overflow occurs. Can you provide a way to store the output in a temporary 
> file?
> {code:java}
> -Xmx2000m
> public static void main(String[] args) throws IOException, 
> InterruptedException {
>               File file = new 
> File("D:\\2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf");
>               PDDocument pdf = Loader.loadPDF(file, 
> IOUtils.createTempFileOnlyStreamCache());
>               pdf.setResourceCache(new PdfboxResourceCache());
>               PDFRenderer renderer = new PDFRenderer(pdf);
>               renderer.setSubsamplingAllowed(true);
>               BufferedImage bi = renderer.renderImage(5, 0.125f);
>               Thread.sleep(3600000);
>               pdf.close();
>       }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5675) org.apache.pdfbox.filter.Filter#decode() Java heap space

Reply via email to