Inconsistency in parsing PDFs between Windows and Linux
-------------------------------------------------------

                 Key: PDFBOX-720
                 URL: https://issues.apache.org/jira/browse/PDFBOX-720
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
         Environment: Windows Vista 32-bit, Sun JDK 1.5.0_06, PDFBox HEAD tag 
(revision 941073)
vs.
Red Hat Linux, 2.6.9-67.ELsmp kernel, Java 1.5.0_06, PDFBox HEAD tag (revision 
941073)
            Reporter: Adam Nichols
             Fix For: 1.2.0


Run this same code using the same PDF and you'll get different results on Linux 
than on Windows.  Regardless of which one you consider "correct", it should be 
consistent.

doc = PDDocument.load(inputFile);
PDDocumentOutline outline = doc.getDocumentCatalog().getDocumentOutline();
if(outline == null)
    System.out.println("Document outline was null");
else
    System.out.println("Document outline was not null");

Some interesting notes about this PDF: Seems that Acrobat Distiller 8.1.0 
basically just concatenated two PDFs into one.  There are two trailers, they 
both refer to object "1600 0" as the root.  1600 0 appears multiple times, one 
time it doesn't have "Outlines" in the dictionary, the other time it has 
"Outlines 1667 0".  Windows picks up the latter and shows the outline 
correctly.  Linux picks up the former and thus returns null for the outline.  I 
tried debugging through PDFParser and BaseParser, but I'm not really sure how 
that code works and I quickly got lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to