[
https://issues.apache.org/jira/browse/NIFI-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325409#comment-14325409
]
Joseph Witt commented on NIFI-296:
----------------------------------
the other thing I'll add is we need to be sure we understand Tika's behavior
with regard to memory consumption. If in the process of dealing with mime
detection of formats which don't have magic headers or (stuff in the front that
makes the format obvious) does Tika load everything into memory or does it
efficiently stream the data through a small buffer space. Easy enough to test
most likely. I think 7z is one of those 'puts the stuff towards the end' type
of formats.
> Extend the capability of IdentifyMimeType and extract document metadata
> -----------------------------------------------------------------------
>
> Key: NIFI-296
> URL: https://issues.apache.org/jira/browse/NIFI-296
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Joseph Witt
> Priority: Minor
>
> Apache Tika is pretty awesome and can handle a large range of document types.
> It could perhaps be used to extend the capability of IdentifyMimeType and it
> could also potentially be used to automatically extract document
> metadata/data as flow file attributes to be used for data flow routing
> decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)