[
https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813566#comment-16813566
]
Tim Allison commented on TIKA-2849:
-----------------------------------
If you want only magic mime based detection, you can turn off the other
detectors, esp zipcontainer detector....perhaps just use Tika-core? You won’t
necessarily get subtypes of ooxml... e.g. docx vs xlsx. For some file types you
really do need the full stream to do fine grained detection, but if you see
areas for improvement, our GitHub site is open for PRs and committers are
standing by. :D
> TikaInputStream copies the input stream locally
> -----------------------------------------------
>
> Key: TIKA-2849
> URL: https://issues.apache.org/jira/browse/TIKA-2849
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.20
> Reporter: Boris Petrov
> Priority: Major
>
> When doing "tika.detect(stream, name)" and the stream is a "TikaInputStream",
> execution gets to "TikaInputStream#getPath" which does a "Files.copy(in,
> path, REPLACE_EXISTING);" which is very, very bad. This input stream could
> be, as in our case, an input stream from a network file which is tens or
> hundreds of gigabytes large. Copying it locally is a huge waste of resources
> to say the least. Why does it do that and can I make it not do it? Or is this
> something that has to be fixed in Tika?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)