[
https://issues.apache.org/jira/browse/TIKA-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264112#comment-13264112
]
Nick Burch commented on TIKA-876:
---------------------------------
We still can't help you very much without a (small) sample file, any chance you
could upload one?
If your PDFs really are wrapped in PKCS7, then we'll need something that
unpacks the PCKS7 wrapper, and for signed files (initially - no way to supply
the private key yet for encrypted ones) triggers the recursing parser for the
contents. I think BouncyCastle might help for this, it's worth a look to start
with
In r1331634 I've added some mime magic for pkcs7 files. I'm not sure if it's
quite right or not, but it seems OK for a few files I've tried. It'll need
someone who knows the PCKS format (or maybe just DER encoding?) to be sure
though. Ideally, we should distinguish between signed, encrypted and
signed+encrypted, but I'm not sure how we do that...
> Signed pdf parsing
> ------------------
>
> Key: TIKA-876
> URL: https://issues.apache.org/jira/browse/TIKA-876
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Affects Versions: 1.0
> Environment: Java 6.0, Ubuntu
> Reporter: Fausto Cruzeiro de Moraes
> Labels: features
> Fix For: 1.0
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Is there an estimated date for implementing default parsing for signed
> documents, like signed pdf files (pk7s format), for example?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira