[
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746737#comment-16746737
]
Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:14 PM:
-------------------------------------------------------------------
Updated references are:
* [RFC-5652, Cryptographic Message Syntax
(CMS)|https://tools.ietf.org/html/rfc5652]
* [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2
Message Specification|https://tools.ietf.org/html/rfc5751]
* [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS
Structures|https://tools.ietf.org/html/rfc7468]
Tika looks for any "pkcs7" OID at the beginning of the file and, if found,
returns "application/pkcs7-signature".
The OIDs that should be looked for are "pkcs7-signedData",
"pkcs7-envelopedData" and "id-smime-ct-compressedData".
There are three media types with "pkcs7-signedData" at the beginning, namely:
* "application/pkcs7-signature", extention ".p7s", when the signed content is
not present (detached signature)
* "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the
signed content is present
* "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when
there are only certificates and (optionally) CRLs
When the OID is "pkcs7-envelopedData" the media type is
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".
When the OID is "id-smime-ct-compressedData" the media type is
"application/pkcs7-mime; smime-type=compressed-data" and the extension is
".p7z".
Extension ".p7b" is registered in Tika with media type
"application/x-pkcs7-certificates" but I think the content of such files is the
same as ".p7c" ones.
Furthermore the label in the textual encoding is always PKCS7 (i.e. the file
begins with "-----BEGIN PKCS7").
I can provide examples, built using openssl, but to support those media types
Tika shall:
* return parameters in media type when detecting streams
* return different extensions based on media type parameters
* further inspect streams when "-----BEGIN PKCS7" or "pkcs7-signedData" are
found (like it does for XML streams)
* register "application/pkcs7-signature" as sub-class of
"application/pkcs7-mime"
* remove "application/x-pkcs7-certificates"
was (Author: roberto.benedetti):
Updated references are:
* [RFC-5652, Cryptographic Message Syntax
(CMS)|https://tools.ietf.org/html/rfc5652]
* [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2
Message Specification|https://tools.ietf.org/html/rfc5751]
* [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS
Structures|https://tools.ietf.org/html/rfc7468]
Tika looks for any "pkcs7" OID at the beginning of the file and, if found,
returns "application/pkcs7-signature".
The OIDs that should be looked for are "pkcs7-signedData",
"pkcs7-envelopedData" and "id-smime-ct-compressedData".
There are three media types with "pkcs7-signedData" at the beginning, namely:
* "application/pkcs7-signature", extention ".p7s", when the signed content is
not present (detached signature)
* "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the
signed content is present
* "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when
there are only certificates and (optionally) CRLs
When the OID is "pkcs7-envelopedData" the media type is
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".
When the OID is "id-smime-ct-compressedData" the media type is
"application/pkcs7-mime; smime-type=compressed-data" and the extension is
".p7z".
Extension ".p7b" is registered in Tika with media type
"application/x-pkcs7-certificates" but I think the content of such files is the
same as ".p7c" ones.
Furthermore the label in the textual encoding is always PKCS7 (i.e. the file
begins with "-----BEGIN PKCS7").
I can provide examples, built using openssl, but to support those media types
Tika shall:
* return parameters in media type when detecting streams
* return different extensions based on media type parameters
* further inspect streams when "-----BEGIN PKCS7" or "pkcs7-signedData" are
found (like it does for XML streams)
* register "application/pkcs7-signature" as sub-class of
"application/pkcs7-mime"
> Problem in Tika().detect for xml file signed in CADES
> -----------------------------------------------------
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
> Issue Type: Sub-task
> Components: detector
> Affects Versions: 1.13
> Environment: JDK 1.7
> Reporter: Michele Andreano
> Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as
> mimetype application / pkcs7-mime instead gives me application /
> pkcs7-signature.
> How is it possible?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)