[
https://issues.apache.org/jira/browse/NIFI-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon DeVries updated NIFI-4718:
----------------------------------
Description:
IdentifyMimeType uses tika configured with a custom-mimetypes.xml\[1] to
specify (among others) the flowfile-v* mime types. However, these do not
include priorities. Therefore, a NiFi FlowFile V3 package with a payload
containing, for example, html including the string:
{code}
<html xmlns=
{code}
will be identified as "application/xhtml+xml" \[2] which, while matching the
pattern, is not as correct as identifying it as application/flowfile-v3. To
fix this, I believe we need to specify a higher priority for the FlowFile V3
"magic"...
\[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml#L26-L31
\[2]
https://gitbox.apache.org/repos/asf?p=tika.git;a=blob;f=tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml;hb=refs/heads/master
was:
IdentifyMimeType uses tika configured with a custom-mimetypes.xml\[1] to
specify (among others) the flowfile-v* mime types. However, these do not
include priorities. Therefore, a NiFi FlowFile V3 package with a payload
containing, for example, html including the string:
{code}
<html xmlns=
{code}
will be identified as "application/xhtml+xml" \[2] which, while matching the
pattern, is not as correct as identifying it as application/flowfile-v3. To
fix this, I believe we need to specify a higher priority for the FlowFile V3
"magic"...
\[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml#L26-L31
\2]
https://gitbox.apache.org/repos/asf?p=tika.git;a=blob;f=tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml;hb=refs/heads/master
> IdentifyMimeType: increase priority for FFv3
> --------------------------------------------
>
> Key: NIFI-4718
> URL: https://issues.apache.org/jira/browse/NIFI-4718
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Reporter: Brandon DeVries
> Priority: Minor
>
> IdentifyMimeType uses tika configured with a custom-mimetypes.xml\[1] to
> specify (among others) the flowfile-v* mime types. However, these do not
> include priorities. Therefore, a NiFi FlowFile V3 package with a payload
> containing, for example, html including the string:
> {code}
> <html xmlns=
> {code}
> will be identified as "application/xhtml+xml" \[2] which, while matching the
> pattern, is not as correct as identifying it as application/flowfile-v3. To
> fix this, I believe we need to specify a higher priority for the FlowFile V3
> "magic"...
> \[1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml#L26-L31
> \[2]
> https://gitbox.apache.org/repos/asf?p=tika.git;a=blob;f=tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml;hb=refs/heads/master
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)