[
https://issues.apache.org/jira/browse/PDFBOX-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878884#action_12878884
]
David Hedley commented on PDFBOX-720:
-------------------------------------
This is a more general issue of incorrect parsing of incrementally updated PDFs
which I am also experiencing. Is anyone currently investigating this?
> Inconsistency in parsing PDFs between Windows and Linux
> -------------------------------------------------------
>
> Key: PDFBOX-720
> URL: https://issues.apache.org/jira/browse/PDFBOX-720
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Environment: Windows Vista 32-bit, Sun JDK 1.5.0_06, PDFBox HEAD tag
> (revision 941073)
> vs.
> Red Hat Linux, 2.6.9-67.ELsmp kernel, Java 1.5.0_06, PDFBox HEAD tag
> (revision 941073)
> Reporter: Adam Nichols
> Fix For: 1.2.0
>
> Attachments: 238_Page_Report.pdf
>
>
> Run this same code using the same PDF and you'll get different results on
> Linux than on Windows. Regardless of which one you consider "correct", it
> should be consistent.
> doc = PDDocument.load(inputFile);
> PDDocumentOutline outline = doc.getDocumentCatalog().getDocumentOutline();
> if(outline == null)
> System.out.println("Document outline was null");
> else
> System.out.println("Document outline was not null");
> Some interesting notes about this PDF: Seems that Acrobat Distiller 8.1.0
> basically just concatenated two PDFs into one. There are two trailers, they
> both refer to object "1600 0" as the root. 1600 0 appears multiple times,
> one time it doesn't have "Outlines" in the dictionary, the other time it has
> "Outlines 1667 0". Windows picks up the latter and shows the outline
> correctly. Linux picks up the former and thus returns null for the outline.
> I tried debugging through PDFParser and BaseParser, but I'm not really sure
> how that code works and I quickly got lost.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.