Tim Allison created TIKA-3976:
---------------------------------
Summary: Allow users to configure behavior for zero-byte files
Key: TIKA-3976
URL: https://issues.apache.org/jira/browse/TIKA-3976
Project: Tika
Issue Type: Task
Reporter: Tim Allison
We currently throw a ZeroByteFileException whenever the stream is empty in
AutoDetectParser.
I _think_ the reason we did this was for use cases in search systems, where it
would be exceptional to send in a zero-byte file.
For other use cases, though, especially for embedded files, it is kind of
normal to have zero-byte contents but have meaningful metadata.
So, embedded files generally are one place (as in .ppt, etc.), but WARC
redirects and HTTPResponse files would be other types of containers that may
include meaningful metadata in the embedded file, but the embedded file has a
zero-byte stream.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)