[
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15481534#comment-15481534
]
Andre commented on NIFI-2374:
-----------------------------
[~joewitt]
Note sure if we are on the same page, but this is truly a version bump, no
added functionality, specially around metadata extraction via parsers.
1 - I am not sure if we need the parsers to be honest... If I understand Tika
correctly, the core library does identification while the Parsers would allow
us to extract metadata from the identified files.
I base this understanding on the following excerpt from the URL you linked:
bq. Please note that Apache Tika is able to detect a much wider range of
formats than those listed below, this page only documents those formats from
which Tika is able to extract metadata and/or textual content.
2 - The list is for parsers, not for "file magic" performed by
[Detector|https://tika.apache.org/1.13/api/org/apache/tika/detect/Detector.html]
we call here:
https://github.com/apache/nifi/blob/f987b216090f29719976ed1693be2ea358523aa5/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java#L134
I tried to find a better list but couldn't. :-(
3 - Very valid point... Afaik no changes in regards to NIFI-2667 :-)
So just to emphasise again, my idea was just to bump dependency version,
without adding any additional Tika feature. Let me know if you would like some
extra action I will be happy to address.
> IdentifyMimeType documentation is misleading
> --------------------------------------------
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
> Issue Type: Improvement
> Affects Versions: 1.0.0, 0.7.0
> Reporter: Andre
> Assignee: Andre
> Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one
> present in Tika's DefaultDetector.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)