[
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412336#comment-15412336
]
Joseph Witt commented on NIFI-2374:
-----------------------------------
Hello [~trixpan]. I've moved this to 1.1.0 just given when it came into
release and what appears to remain. Findings:
1) The only thing we're depending on right now is tika-core so it doesn't
include all the parsers.
2) The list you reference as parsers is great but we need to validate what we
actually include parsers for. We can probably get this programatically. If
not this list appears safer to use than the asf-git repo entry
"https://tika.apache.org/1.13/formats.html#Full_list_of_Supported_Formats"
3) We need to review the version changes involved here because if it changes
dependencies (and we'd definitely need to watch that) then we need to account
for them in all the L&N.
One idea to consider is to make Tika-Parsers/Detection be split out into its
own nar because it could be quite huge and quite powerful and would have some
pretty specific dependency implications. Tika is no doubt very cool and
powerful so we should figure out the best way to get this incorporated.
> IdentifyMimeType documentation is misleading
> --------------------------------------------
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
> Issue Type: Improvement
> Affects Versions: 1.0.0, 0.7.0
> Reporter: Andre
> Assignee: Andre
> Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one
> present in Tika's DefaultDetector.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)