[
https://issues.apache.org/jira/browse/PDFBOX-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840563#comment-17840563
]
Dieter von Holten commented on PDFBOX-5675:
-------------------------------------------
there is another problem with this file, which may be more or less connected to
the problem you investigate.
On page 6 the file contains a stream of length 45.953.744, which is circa 45
MByte, the major part of the file-size.
_This_ size itself should be no problem. The stream is FlateEncoded, that is
compressed.
However, when i open the file in PdfDebugger, click on page 6, it get an
exception {color:#172b4d}"Required array size too large" from
{color}
{color:#172b4d}java.util.InputStream.readNBytes(), line 417 (in jdk 17). It is
called from InputStream.readAllBytes( with Integer.MAX_VALUE ), which is
called {color}
{color:#172b4d}from StreamPane.requestStreamText().{color}
{color:#172b4d}However, the internal buffer used in readNBytes() is
Integer.MAX_VALUE-8. This method cannot read byte[] from streams larger
{color}
{color:#172b4d}that Integer.MAX_VALUE-8 (which usually is not a problem). The
subclasses of InputStream seem to be able to handle larger streams,{color}
{color:#172b4d}but the call to InputStream.readNBytes() must be avoided. The
subclasses are a little questionable in this respect, somehow they{color}
{color:#172b4d}know about 'long' positions and offsets, but in some places only
'int' is used. Everything works well when the things are well smaller than
2GB.{color}
{color:#172b4d}HTH{color}
> org.apache.pdfbox.filter.Filter#decode() Java heap space
> --------------------------------------------------------
>
> Key: PDFBOX-5675
> URL: https://issues.apache.org/jira/browse/PDFBOX-5675
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 3.0.0 PDFBox
> Reporter: liu
> Priority: Major
> Attachments: 2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf,
> image-2023-09-05-15-05-50-168.png, image-2024-04-24-16-50-38-925.png,
> image-2024-04-24-18-33-17-524.png, image-2024-04-24-18-35-43-792.png,
> image-2024-04-24-19-25-22-904.png, image.png, screenshot-1.png,
> screenshot-2.png
>
>
> !image-2023-09-05-15-05-50-168.png!
> When converting the sixth page of this PDF
> file(2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf) to an image, a memory
> overflow occurs. Can you provide a way to store the output in a temporary
> file?
> {code:java}
> -Xmx2000m
> public static void main(String[] args) throws IOException,
> InterruptedException {
> File file = new
> File("D:\\2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf");
> PDDocument pdf = Loader.loadPDF(file,
> IOUtils.createTempFileOnlyStreamCache());
> pdf.setResourceCache(new PdfboxResourceCache());
> PDFRenderer renderer = new PDFRenderer(pdf);
> renderer.setSubsamplingAllowed(true);
> BufferedImage bi = renderer.renderImage(5, 0.125f);
> Thread.sleep(3600000);
> pdf.close();
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]