[ https://issues.apache.org/jira/browse/PDFBOX-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-3238. ----------------------------------- Resolution: Duplicate Closing as duplicate of PDFBOX-3428 > Page resources are not inherited from an ancestor node in the page tree > ----------------------------------------------------------------------- > > Key: PDFBOX-3238 > URL: https://issues.apache.org/jira/browse/PDFBOX-3238 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 1.8.11, 2.0.0, 3.0.0 PDFBox > Environment: Found on Windows 7 x64, JRE from 1.5 to 8 > Reporter: Evgeny Chesnokov > Priority: Major > Attachments: Welding Fixture Model.dwg.pdf > > > Attached is a sample file with a single image on the 1st page in it. When I > append the 1st page of a loaded document to a new document, the new document > does not have an image in it (displayed as a blank page; Acrobat Reader says > the file is broken). > Steps to reproduce: > 1. load an attached PDF file using PdfBox (checked versions 1.8.11 and > 2.0.0-RC2, tried both {{#load()}} and {{#loadNonSeq()}}) > 2. create a new document > 3. add a page from a loaded document to a new document > 4. save a document to a new file. > Expected: a new PDF file gets created, when opened, it contains an image on > the 1st page. > Actual behaviour: a new PDF file gets created, when opened, the 1st page is > empty and Acrobat Reader reports an error ("An error exists on this page. > Acrobat may not display the page correctly."). > Code to reproduce the issue for version 1.8.11: > {code} > PDDocument source = PDDocument.load(new File("Welding Fixture > Model.dwg.pdf")); > PDPage page = (PDPage) > source.getDocumentCatalog().getAllPages().get(0); > > PDDocument destination = new PDDocument(); > destination.addPage(page); > destination.save("Welding Fixture Model.dwg.page0.pdf"); > destination.close(); > {code} > ========== > Research summary: I've decoded the attached PDF using {{qpdf}} utility and > investigated its structure. Basically, there's no {{/Resources}} declaration > in a {{/Page}} object, so it should get inherited from a {{/Pages}} object. > Instead it is replaced with an empty resources object, so when saved, it does > not have an image in it. > Research details: > Below are pieces of a decoded structure of the attached PDF. > *Pages list declaration:* > {noformat} > 3 0 obj > << > /Count 1 > /Kids [ > 4 0 R > ] > /Resources 5 0 R > /Type /Pages > >> > endobj > {noformat} > Explanation: > - {{/Type /Pages}} says this object is a list of pages; > - {{/Kids}} is an array of references to the individual page objects. In > this case, object #4 is the only page in a document; > - {{/Resources 5 0 R}} stores a reference to a single resource that is used > by the {{/Pages}} object. This is object #5, an image. > *1st page declaration:* > {noformat} > 4 0 obj > << > /Contents 6 0 R > /MediaBox [ > 0 > 0 > 1984 > 2551 > ] > /Parent 3 0 R > /Type /Page > >> > endobj > {noformat} > Explanation: > - {{/Type /Page}} says it's a page (duh); > - {{/Contents 6 0 R}} references an object #6 that is used to render the > content of the page (I won't provide it but it uses the image object #5 > mentioned above); > - {{/Parent 3 0 R}} is a reference to a {{/Pages}} object described above. > An important thing here is that this object does not have a {{/Resources}} > section of its own. In this case, PDF spec says: > bq. (Required; inheritable) A dictionary containing any resources required by > the page (see 7.8.3, "Resource Dictionaries"). If the page requires no > resources, the value of this entry shall be an empty dictionary. *Omitting > the entry entirely indicates that the resources shall be inherited from an > ancestor node in the page tree*. > This last sentence means that Page 1 has the same list of resources as its > parent /Pages object, and this is where PdfBox misbehaves. When exporting a > page with no {{/Resources}} tag, it uses an **EMPTY** list of resources > instead of an inherited one. > To verify this, I've added {{/Resources 5 0 R}} line to the sample PDF 1st > page declaration: > {noformat} > 4 0 obj > << > /Contents 6 0 R > /MediaBox [ > 0 > 0 > 1984 > 2551 > ] > /Parent 3 0 R > /Resources 5 0 R > /Type /Page > >> > endobj > {noformat} > After I did this, PdfBox successfully extracted the 1st page of this document > and it correctly displayed an image. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org