[ 
https://issues.apache.org/jira/browse/PDFBOX-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-3238.
-----------------------------------
    Resolution: Duplicate

Closing as duplicate of PDFBOX-3428

> Page resources are not inherited from an ancestor node in the page tree
> -----------------------------------------------------------------------
>
>                 Key: PDFBOX-3238
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3238
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.11, 2.0.0, 3.0.0 PDFBox
>         Environment: Found on Windows 7 x64, JRE from 1.5 to 8
>            Reporter: Evgeny Chesnokov
>            Priority: Major
>         Attachments: Welding Fixture Model.dwg.pdf
>
>
> Attached is a sample file with a single image on the 1st page in it. When I 
> append the 1st page of a loaded document to a new document, the new document 
> does not have an image in it (displayed as a blank page; Acrobat Reader says 
> the file is broken).
> Steps to reproduce:
> 1. load an attached PDF file using PdfBox (checked versions 1.8.11 and 
> 2.0.0-RC2, tried both {{#load()}} and {{#loadNonSeq()}})
> 2. create a new document
> 3. add a page from a loaded document to a new document
> 4. save a document to a new file.
> Expected: a new PDF file gets created, when opened, it contains an image on 
> the 1st page.
> Actual behaviour: a new PDF file gets created, when opened, the 1st page is 
> empty and Acrobat Reader reports an error ("An error exists on this page. 
> Acrobat may not display the page correctly.").
> Code to reproduce the issue for version 1.8.11:
> {code}
>         PDDocument source = PDDocument.load(new File("Welding Fixture 
> Model.dwg.pdf"));
>         PDPage page = (PDPage) 
> source.getDocumentCatalog().getAllPages().get(0);
>         
>         PDDocument destination = new PDDocument();
>         destination.addPage(page);
>         destination.save("Welding Fixture Model.dwg.page0.pdf");
>         destination.close();
> {code}
> ==========
> Research summary: I've decoded the attached PDF using {{qpdf}} utility and  
> investigated its structure. Basically, there's no {{/Resources}} declaration 
> in a {{/Page}} object, so it should get inherited from a {{/Pages}} object. 
> Instead it is replaced with an empty resources object, so when saved, it does 
> not have an image in it.
> Research details:
> Below are pieces of a decoded structure of the attached PDF.
> *Pages list declaration:*
> {noformat}
> 3 0 obj
> <<
>   /Count 1
>   /Kids [
>     4 0 R
>   ]
>   /Resources 5 0 R
>   /Type /Pages
> >>
> endobj
> {noformat}
> Explanation:
>  - {{/Type /Pages}} says this object is a list of pages;
>  - {{/Kids}} is an array of references to the individual page objects. In 
> this case, object #4 is the only page in a document;
>  - {{/Resources 5 0 R}} stores a reference to a single resource that is used 
> by the {{/Pages}} object. This is object #5, an image.
> *1st page declaration:*
> {noformat}
> 4 0 obj
> <<
>   /Contents 6 0 R
>   /MediaBox [
>     0
>     0
>     1984
>     2551
>   ]
>   /Parent 3 0 R
>   /Type /Page
> >>
> endobj
> {noformat}
> Explanation:
>  - {{/Type /Page}} says it's a page (duh);
>  - {{/Contents 6 0 R}} references an object #6 that is used to render the 
> content of the page (I won't provide it but it uses the image object #5 
> mentioned above);
>  - {{/Parent 3 0 R}} is a reference to a {{/Pages}} object described above.
> An important thing here is that this object does not have a {{/Resources}} 
> section of its own. In this case, PDF spec says:
> bq. (Required; inheritable) A dictionary containing any resources required by 
> the page (see 7.8.3, "Resource Dictionaries"). If the page requires no 
> resources, the value of this entry shall be an empty dictionary. *Omitting 
> the entry entirely indicates that the resources shall be inherited from an 
> ancestor node in the page tree*.
> This last sentence means that Page 1 has the same list of resources as its 
> parent /Pages object, and this is where PdfBox misbehaves. When exporting a 
> page with no {{/Resources}} tag, it uses an **EMPTY** list of resources 
> instead of an inherited one.
> To verify this, I've added {{/Resources 5 0 R}} line to the sample PDF 1st 
> page declaration:
> {noformat}
> 4 0 obj
> <<
>   /Contents 6 0 R
>   /MediaBox [
>     0
>     0
>     1984
>     2551
>   ]
>   /Parent 3 0 R
>   /Resources 5 0 R
>   /Type /Page
> >>
> endobj
> {noformat}
> After I did this, PdfBox successfully extracted the 1st page of this document 
> and it correctly displayed an image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to