Jim Kay created PDFBOX-1907:
-------------------------------
Summary: Out of memory - heap space - COSDocument
Key: PDFBOX-1907
URL: https://issues.apache.org/jira/browse/PDFBOX-1907
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 1.8.4
Environment: windows xp 64
jdk 8 32 bit
Reporter: Jim Kay
Possibly related to PDFBOX-1777.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.AbstractCollection.toArray(AbstractCollection.java:136)
at java.util.ArrayList.<init>(ArrayList.java:168)
at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)
at techref.Testpdfbox.main(Testpdfbox.java:36)
The heap space is set to -Xmx1640m
The pdf docoument is parsed OK with version 1.8.3 but fails with 1.8.4
The large pdf document has the following attributes.
pdDoc.getCurrentAccessPermission.canExtractContent = true
pdDoc.getCurrentAccessPermission.canExtractForAccessibility = true
pdDoc.getNumberOfPages = 228
pdDoc.getDocumentCatalog.getLanguage = null
pdDoc.getDocumentCatalog.getPageLayout = SinglePage
pdDoc.getDocumentCatalog.getPageMode = UseNone
pdDoc.getDocumentCatalog.getVersion = null
Page Count=228
Title=Microsoft Word - FEA.doc
Author=null
Subject=null
Keywords=null
Creator=Windows NT 4.0
Producer=Acrobat Distiller 4.05 for Windows
Creation Date=Fri Jun 29 15:29:59 BST 2001
Modification Date=Mon Jul 02 15:41:18 BST 2001
Trapped=null
Dictionary=COSDictionary{(COSName{CreationDate}:COSString{D:20010629142959})
(COSName{Producer}:COSString{Acrobat Distiller 4.05 for Windows})
(COSName{Creator}:COSString{Windows NT 4.0})
(COSName{Title}:COSString{Microsoft Word - FEA.doc})
(COSName{ModDate}:COSString{D:20010702164118+02'00'}) }
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)