[ 
https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081294#comment-17081294
 ] 

Luís Filipe Nassif commented on TIKA-2849:
------------------------------------------

[~tallison] actually parse time is not the big problem, but large temp files 
exhausting limited disk resources and breaking parsing of other smaller files 
or other parts of the application. BoundedInputStream would hurt parsers that 
do not need to create temp files...

And all TikaInputStream constructors are private, so we can not extend it. If 
that could be changed, users could extend to their needs, what do you think?

> TikaInputStream copies the input stream locally
> -----------------------------------------------
>
>                 Key: TIKA-2849
>                 URL: https://issues.apache.org/jira/browse/TIKA-2849
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.20
>            Reporter: Boris Petrov
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.21
>
>
> When doing "tika.detect(stream, name)" and the stream is a "TikaInputStream", 
> execution gets to "TikaInputStream#getPath" which does a "Files.copy(in, 
> path, REPLACE_EXISTING);" which is very, very bad. This input stream could 
> be, as in our case, an input stream from a network file which is tens or 
> hundreds of gigabytes large. Copying it locally is a huge waste of resources 
> to say the least. Why does it do that and can I make it not do it? Or is this 
> something that has to be fixed in Tika?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to