[
https://issues.apache.org/jira/browse/PDFBOX-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187375#comment-14187375
]
Tilman Hausherr commented on PDFBOX-2423:
-----------------------------------------
TestTextStripper fails
While the annotations are now correct, there are slight differences to what it
was last friday.
The Braun file (bugzilla867751.pdf) is incorrect on the first page, the "B" is
missing.
example_026.pdf looks like somebody bleed on it.
> Page tree handling needs rewriting
> ----------------------------------
>
> Key: PDFBOX-2423
> URL: https://issues.apache.org/jira/browse/PDFBOX-2423
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 1.8.7, 2.0.0
> Reporter: John Hewson
> Assignee: John Hewson
> Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: 26101_Colors.ai-1.png, 26101_Colors.ai-1.png-diff.png,
> Basiswissen-Vorschriften.pdf-3.png,
> Basiswissen-Vorschriften.pdf-3.png-diff.png,
> Basiswissen-Vorschriften.pdf-4.png,
> Basiswissen-Vorschriften.pdf-4.png-diff.png, PDFBOX-1058.pdf-1.png,
> PDFBOX-1058.pdf-1.png-diff.png, PDFBOX-1058.pdf-4.png,
> PDFBOX-1058.pdf-4.png-diff.png, PDFBOX-1094-tiling_pattern.pdf,
> PDFBOX-1711-cmyk.pdf-1.png, PDFBOX-1711-cmyk.pdf-1.png-diff.png,
> PDFBOX-1794-vattenfall.pdf-1.png, PDFBOX-1794-vattenfall.pdf-1.png-diff.png,
> PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png,
> asy-functionshading.pdf-1.png, asy-functionshading.pdf-1.png-diff.png,
> gs-bugzilla694385.pdf, jagpdf_doc_patterns.pdf
>
>
> The way in which PDFBox handles the Page tree needs to be rewritten,
> preferably from scratch. Currently the document catalog returns the raw
> objects from the page tree, wrapped in either a PDPage or PDPageNode.
> We need to abstract over the page tree and get rid of PDPageNode, we should
> provide methods which can add/remove PDPage objects *only*. The existing
> low-level access to the page tree is not needed at the PD-level.
> Inheritance of page properties such as crop box, resources, and rotation
> should be reimplemented to use whatever new page tree abstraction we invent.
> We can finally remove the old broken methods which didn't look up the
> inheritance tree when retrieving these values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)