[
https://issues.apache.org/jira/browse/PDFBOX-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898158#comment-13898158
]
John Hewson commented on PDFBOX-1907:
-------------------------------------
{quote}
Sorry pdf is confidential, so I can't post it.
{quote}
If you can extract a non-confidential page using PDFSplit and that page
exhibits the same problem, that would work.
There's not much we can do for heap space issues without an attached PDF
because the memory allocations which are filling the heap almost certainly
occurred somewhere else in PDFBox and not at the point listed in the stack
trace, so we have zero information to go on.
> Out of memory - heap space - COSDocument
> ----------------------------------------
>
> Key: PDFBOX-1907
> URL: https://issues.apache.org/jira/browse/PDFBOX-1907
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.8.4
> Environment: windows xp 64
> jdk 8 32 bit
> Reporter: Jim Kay
> Labels: regression
>
> Possibly related to PDFBOX-1777.
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.AbstractCollection.toArray(AbstractCollection.java:136)
> at java.util.ArrayList.<init>(ArrayList.java:168)
> at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
> at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
> at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)
> at techref.Testpdfbox.main(Testpdfbox.java:36)
> The heap space is set to -Xmx1640m
> The pdf docoument is parsed OK with version 1.8.3 but fails with 1.8.4
> The large pdf document has the following attributes.
> pdDoc.getCurrentAccessPermission.canExtractContent = true
> pdDoc.getCurrentAccessPermission.canExtractForAccessibility = true
> pdDoc.getNumberOfPages = 228
> pdDoc.getDocumentCatalog.getLanguage = null
> pdDoc.getDocumentCatalog.getPageLayout = SinglePage
> pdDoc.getDocumentCatalog.getPageMode = UseNone
> pdDoc.getDocumentCatalog.getVersion = null
> Page Count=228
> Title=Microsoft Word - FEA.doc
> Author=null
> Subject=null
> Keywords=null
> Creator=Windows NT 4.0
> Producer=Acrobat Distiller 4.05 for Windows
> Creation Date=Fri Jun 29 15:29:59 BST 2001
> Modification Date=Mon Jul 02 15:41:18 BST 2001
> Trapped=null
> Dictionary=COSDictionary{(COSName{CreationDate}:COSString{D:20010629142959})
> (COSName{Producer}:COSString{Acrobat Distiller 4.05 for Windows})
> (COSName{Creator}:COSString{Windows NT 4.0})
> (COSName{Title}:COSString{Microsoft Word - FEA.doc})
> (COSName{ModDate}:COSString{D:20010702164118+02'00'}) }
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)