[ 
https://issues.apache.org/jira/browse/NIFI-11084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679236#comment-17679236
 ] 

Mark Ward commented on NIFI-11084:
----------------------------------

Forgot to add, here's a link to the appropriate Tika rule I believe is causing 
this file to be misidentified:

https://github.com/digipres/digipres.github.io/blob/5156d6d882bc0b9d73682e56d6cb92ff158de7b8/_sources/registries/tika/tika-mimetypes.xml#L5012

> Character/text data "mis-identified" by IdentifyMimeType processor
> ------------------------------------------------------------------
>
>                 Key: NIFI-11084
>                 URL: https://issues.apache.org/jira/browse/NIFI-11084
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.15.2, 1.19.1
>         Environment: Windows Server 2019, Java 11.0.17
>            Reporter: Mark Ward
>            Priority: Minor
>         Attachments: mime_type_mis-id_file.csv
>
>
> When *IdentifyMimeType* is presented with a text file with a `.csv` extension 
> and the first two characters of the content as `P2`, the processor 
> mis-identifies the mime.extension as `pgm` and mime.type as 
> `image/x-portable-graymap`.
> The processor's *Use Filename In Detection* property is set to `true`.
> An example file is attached and the following flow and be used to reproduce: 
> *GetFile* > *IdentifyMimeType* where the outputted flowfile's attributes can 
> be inspected.
> This has been tested on NiFi versions `1.15.2` and `1.19.1` both running on a 
> Window's Server 2019 instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to