[jira] [Commented] (PDFBOX-3398) Text (XML) output of pdf structure

Maruan Sahyoun (JIRA) Sun, 26 Jun 2016 13:55:30 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350234#comment-15350234
 ]


Maruan Sahyoun commented on PDFBOX-3398:
----------------------------------------

There is no such information in the PDF document linked from the SO article.

As far as I could see the StructureTree is used to identify the different 
building blocks of the document and points to the corresponding marked content 
sequences in the page content stream. The only accessibility feature used is 
the specification of the language using the {{Lang}} attribute. There is no 
plain text definition or replacement text as far as I can tell which is why you 
couldn't find it.

> Text (XML) output of pdf structure
> ----------------------------------
>
>                 Key: PDFBOX-3398
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3398
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Parsing, Utilities
>            Reporter: Stefan Hegny
>            Priority: Minor
>
> It would be nice to have a text/xml representation output to pdf file of the 
> entire document structure as can be browsed in the debugger window GUI. It 
> would allow for easier searching and understanding of the structure. Not sure 
> if it should be an option to PDFReader/PDFDebugger  or a separate class that 
> might also be bundled into an app jar. I would even start working on it given 
> the preferred base to start on



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-3398) Text (XML) output of pdf structure

Reply via email to