[ 
https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812492#comment-16812492
 ] 

Ken Krugler commented on TIKA-2849:
-----------------------------------

Hi [~boris-petrov] - two things here. First, do you have the call stack down to 
the {{TikaInputStream#getPath}} request? Second, it would be great if we could 
first have this discussion on the user mailing list, before an issue gets 
created, thanks!

Finally, I'm not an expert on this part of the code, but I know that some of 
the parsers (from other open source projects) wrapped by Tika require a local 
file as input - in those situations, Tika uses the above code to create a file 
that can be passed in. What's the type of data being streamed in?

> TikaInputStream copies the input stream locally
> -----------------------------------------------
>
>                 Key: TIKA-2849
>                 URL: https://issues.apache.org/jira/browse/TIKA-2849
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.20
>            Reporter: Boris Petrov
>            Priority: Major
>
> When doing "tika.detect(stream, name)" and the stream is a "TikaInputStream", 
> execution gets to "TikaInputStream#getPath" which does a "Files.copy(in, 
> path, REPLACE_EXISTING);" which is very, very bad. This input stream could 
> be, as in our case, an input stream from a network file which is tens or 
> hundreds of gigabytes large. Copying it locally is a huge waste of resources 
> to say the least. Why does it do that and can I make it not do it? Or is this 
> something that has to be fixed in Tika?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to