[
https://issues.apache.org/jira/browse/NIFI-11084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679236#comment-17679236
]
Mark Ward commented on NIFI-11084:
----------------------------------
Forgot to add, here's a link to the appropriate Tika rule I believe is causing
this file to be misidentified:
https://github.com/digipres/digipres.github.io/blob/5156d6d882bc0b9d73682e56d6cb92ff158de7b8/_sources/registries/tika/tika-mimetypes.xml#L5012
> Character/text data "mis-identified" by IdentifyMimeType processor
> ------------------------------------------------------------------
>
> Key: NIFI-11084
> URL: https://issues.apache.org/jira/browse/NIFI-11084
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.15.2, 1.19.1
> Environment: Windows Server 2019, Java 11.0.17
> Reporter: Mark Ward
> Priority: Minor
> Attachments: mime_type_mis-id_file.csv
>
>
> When *IdentifyMimeType* is presented with a text file with a `.csv` extension
> and the first two characters of the content as `P2`, the processor
> mis-identifies the mime.extension as `pgm` and mime.type as
> `image/x-portable-graymap`.
> The processor's *Use Filename In Detection* property is set to `true`.
> An example file is attached and the following flow and be used to reproduce:
> *GetFile* > *IdentifyMimeType* where the outputted flowfile's attributes can
> be inspected.
> This has been tested on NiFi versions `1.15.2` and `1.19.1` both running on a
> Window's Server 2019 instance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)