Jukka Zitting created TIKA-1190:
-----------------------------------
Summary: ZipContainerDetector.detect() can spool the entire stream
to a temporary file
Key: TIKA-1190
URL: https://issues.apache.org/jira/browse/TIKA-1190
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.4
Reporter: Jukka Zitting
Assignee: Jukka Zitting
As noted in a TODO comment, currently the {{ZipContainerDetector}} calls
{{getFile()}} on a given {{TikaInputStream}} instance (that looks like a ZIP
archive) without using the {{hasFile()}} method to check whether a backing file
is actually available.
This is troublesome as it can lead to unexpected performance loss due to the
entire stream getting spooled to a temporary file that might not be needed at
all after the detection.
A better approach would be to only do the more detailed "full file" format
detection if the backing file is already available, i.e. if {{hasFile()}}
returns true.
--
This message was sent by Atlassian JIRA
(v6.1#6144)