[jira] [Commented] (PDFBOX-1623) Can't detect hidden text on pdf page

JIRA Sat, 15 Jun 2013 09:22:05 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684231#comment-13684231
 ]


Andreas Lehmkühler commented on PDFBOX-1623:
--------------------------------------------

OK, I see. So we should introduce a switch to somehow skip those parts of the 
text.

BTW, Acrobat reader does the same, it dumps the whole text including the hidden 
one
                
> Can't detect hidden text on pdf page
> ------------------------------------
>
>                 Key: PDFBOX-1623
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1623
>             Project: PDFBox
>          Issue Type: Wish
>          Components: Text extraction
>    Affects Versions: 1.8.1
>         Environment: windows java7
>            Reporter: james king
>         Attachments: test_hidden_text.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I trying to extract the text from the page , but it extract the hidden text 
> also which i don't need. How can i detect it is hidden or not? Because in 
> normal case the hidden text is the user don't want to extract!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1623) Can't detect hidden text on pdf page

Reply via email to