[ 
https://issues.apache.org/jira/browse/PDFBOX-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1143:
--------------------------------
    Fix Version/s: 2.0.0

> PDFTextStripper doesn't process text annotations
> ------------------------------------------------
>
>                 Key: PDFBOX-1143
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1143
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> Users are able to add annotations (comments) to a PDF, and PDFBox
> processes them correctly: you can retrieve them via
> PDPage.getAnnotations.
> But PDFTextStripper currently doesn't extract the text from
> annotations.
> I think it [optionally] should?
> I think we'd add a boolean (shouldProcessAnnotations?), and if
> enabled, we'd visit the annotations that have sub-type FreeText, and
> extract what text we can (Subject, TitlePopup, Contents, maybe
> RichContents?), associate the .getRectangle with the text to make a
> TextPosition, and then somehow associate that with the right
> "article" (so that annotations "over" a given article are rendered
> with it).
> Alternatively we just put all annotations into their own "article"?
> I'm not familiar enough with PDF text positioning nor PDFTextStripper
> to work out a real patch here... but I think this approach should
> work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to