[
https://issues.apache.org/jira/browse/PDFBOX-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maruan Sahyoun resolved PDFBOX-4162.
------------------------------------
Resolution: Fixed
resolving per reporters feedback
> OutOfMemoryError in PDExtendedGraphicsState#getLineDashPattern
> --------------------------------------------------------------
>
> Key: PDFBOX-4162
> URL: https://issues.apache.org/jira/browse/PDFBOX-4162
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.8
> Reporter: Andreas Hubold
> Assignee: Andreas Lehmkühler
> Priority: Critical
> Fix For: 2.0.10, 3.0.0 PDFBox
>
>
> I'm getting an OutOfMemoryError from PDFBox when parsing a certain PDF using
> the Apache Tika App v 1.17 - which uses PDFBox 2.0.8 internally. This is
> reproducible even with 8GB heap.
>
> The OutOfMemoryError happens in
> org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState#getLineDashPattern,
> which contains this piece of suspicious code:
> {code:java}
> COSArray dp = (COSArray) dict.getDictionaryObject( COSName.D );
> if( dp != null )
> {
> COSArray array = new COSArray();
> dp.addAll(dp);
> {code}
> The last line is wrong. It appends all elements from 'dp' to 'dp' again,
> effectively duplicating the elements in the list. Maybe the intention was to
> add it to the created array instead.
>
> Stacktrace:
> {noformat}
> [Full GC (Allocation Failure) 4225609K->4224664K(5989888K), 32,9544686 secs]
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3210)
> at java.util.Arrays.copyOf(Arrays.java:3181)
> at java.util.ArrayList.grow(ArrayList.java:261)
> at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
> at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
> at java.util.ArrayList.addAll(ArrayList.java:579)
> at org.apache.pdfbox.cos.COSArray.addAll(COSArray.java:124)
> at
> org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getLineDashPattern(PDExtendedGraphicsState.java:280)
> at
> org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.copyIntoGraphicsState(PDExtendedGraphicsState.java:89)
> at
> org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:61)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
> at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
> at
> org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
> at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
> at
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:205)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:486)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145){noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]