Jukka Zitting created TIKA-1190:
-----------------------------------

             Summary: ZipContainerDetector.detect() can spool the entire stream 
to a temporary file
                 Key: TIKA-1190
                 URL: https://issues.apache.org/jira/browse/TIKA-1190
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.4
            Reporter: Jukka Zitting
            Assignee: Jukka Zitting


As noted in a TODO comment, currently the {{ZipContainerDetector}} calls 
{{getFile()}} on a given {{TikaInputStream}} instance (that looks like a ZIP 
archive) without using the {{hasFile()}} method to check whether a backing file 
is actually available.

This is troublesome as it can lead to unexpected performance loss due to the 
entire stream getting spooled to a temporary file that might not be needed at 
all after the detection.

A better approach would be to only do the more detailed "full file" format 
detection if the backing file is already available, i.e. if {{hasFile()}} 
returns true.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to