[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:24 PM:
-------------------------------------------------------------------

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-----BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-----BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"
 * register "application/x-pkcs7-certificates" as an alias of 
"application/pkcs7-mime; smime-type=certs-only"

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-----BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-----BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"
 * register "application/x-pkcs7-certificates" as an alias of 
"application/pkcs7-mime"

 

> Problem in Tika().detect for xml file signed in CADES
> -----------------------------------------------------
>
>                 Key: TIKA-1997
>                 URL: https://issues.apache.org/jira/browse/TIKA-1997
>             Project: Tika
>          Issue Type: Sub-task
>          Components: detector
>    Affects Versions: 1.13
>         Environment: JDK 1.7
>            Reporter: Michele Andreano
>            Priority: Blocker
>         Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to