[ 
https://issues.apache.org/jira/browse/PDFBOX-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133194#comment-14133194
 ] 

Andreas Lehmkühler commented on PDFBOX-1907:
--------------------------------------------

OK, I double checked, it seems to be os/jdk dependent. It works fine using 
linux but it crashes on win 7. :-(

However the pdf seems to be problematic as the nonsequential parser is 
complaining about the xref offset. Furthermore there is some stuff which isn't 
needed/doesn't belong to the provided pages. Maybe that's the result of the 
split.

So, can we do anything else but to use the nonsequential parser? Should we 
close this one as resolved?





> Out of memory - COSDocument (RandomAccessBuffer)
> ------------------------------------------------
>
>                 Key: PDFBOX-1907
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1907
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.4
>         Environment: windows xp 64
> jdk 8 32 bit
>            Reporter: Jim Kay
>              Labels: regression
>         Attachments: 8283.zip.001, 8283.zip.002, 8283.zip.003
>
>
> Possibly related to PDFBOX-1777.
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>       at java.util.AbstractCollection.toArray(AbstractCollection.java:136)
>       at java.util.ArrayList.<init>(ArrayList.java:168)
>       at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
>       at org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
>       at org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)
>       at techref.Testpdfbox.main(Testpdfbox.java:36)
> The heap space is set to -Xmx1640m
> The pdf docoument is parsed OK with version 1.8.3 but fails with 1.8.4
> The large pdf document has the following attributes.
> pdDoc.getCurrentAccessPermission.canExtractContent = true
> pdDoc.getCurrentAccessPermission.canExtractForAccessibility = true
> pdDoc.getNumberOfPages = 228
> pdDoc.getDocumentCatalog.getLanguage = null
> pdDoc.getDocumentCatalog.getPageLayout = SinglePage
> pdDoc.getDocumentCatalog.getPageMode = UseNone
> pdDoc.getDocumentCatalog.getVersion = null
> Page Count=228
> Title=Microsoft Word - FEA.doc
> Author=null
> Subject=null
> Keywords=null
> Creator=Windows NT 4.0
> Producer=Acrobat Distiller 4.05 for Windows
> Creation Date=Fri Jun 29 15:29:59 BST 2001
> Modification Date=Mon Jul 02 15:41:18 BST 2001
> Trapped=null
> Dictionary=COSDictionary{(COSName{CreationDate}:COSString{D:20010629142959}) 
> (COSName{Producer}:COSString{Acrobat Distiller 4.05 for Windows}) 
> (COSName{Creator}:COSString{Windows NT 4.0}) 
> (COSName{Title}:COSString{Microsoft Word - FEA.doc}) 
> (COSName{ModDate}:COSString{D:20010702164118+02'00'}) }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to