[ 
https://issues.apache.org/jira/browse/NIFI-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325409#comment-14325409
 ] 

Joseph Witt commented on NIFI-296:
----------------------------------

the other thing I'll add is we need to be sure we understand Tika's behavior 
with regard to memory consumption.  If in the process of dealing with mime 
detection of formats which don't have magic headers or (stuff in the front that 
makes the format obvious) does Tika load everything into memory or does it 
efficiently stream the data through a small buffer space.  Easy enough to test 
most likely.  I think 7z is one of those 'puts the stuff towards the end' type 
of formats.

> Extend the capability of IdentifyMimeType and extract document metadata
> -----------------------------------------------------------------------
>
>                 Key: NIFI-296
>                 URL: https://issues.apache.org/jira/browse/NIFI-296
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Joseph Witt
>            Priority: Minor
>
> Apache Tika is pretty awesome and can handle a large range of document types. 
>  It could perhaps be used to extend the capability of IdentifyMimeType and it 
> could also potentially be used to automatically extract document 
> metadata/data as flow file attributes to be used for data flow routing 
> decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to