[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

Nick Burch (JIRA) Mon, 01 Dec 2014 06:54:55 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229867#comment-14229867
 ]


Nick Burch commented on TIKA-1489:
----------------------------------

Can someone pull together a list of common permissions for PDF? We can the 
cross-reference that against permissions from other formats as well (eg Excel, 
Word), and if we go down the "expose the permissions" route we can then have a 
consistent model. (We try to map metadata into the same structure no matter 
what format it comes from, so it would make sense to go for the same for 
permissions as well).

> PDF Text extraction without permission
> --------------------------------------
>
>                 Key: TIKA-1489
>                 URL: https://issues.apache.org/jira/browse/TIKA-1489
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.7
>            Reporter: Tilman Hausherr
>
> In TIKA-1442 text extraction from files like 717226.pdf that don't have text 
> extraction permission works. The permissions in PDF files are only enforced 
> by the application (i.e. PDFBox), i.e. the text information isn't stored 
> separately in encrypted form. 
> PDFBox ExtractText command line does throw an exception.
> So I wonder why TIKA is able to extract text. Either TIKA or the PDFBox call 
> used bypasses the permission checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

Reply via email to