Tim Allison created TIKA-3976:
---------------------------------

             Summary: Allow users to configure behavior for zero-byte files
                 Key: TIKA-3976
                 URL: https://issues.apache.org/jira/browse/TIKA-3976
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


We currently throw a ZeroByteFileException whenever the stream is empty in 
AutoDetectParser.

I _think_ the reason we did this was for use cases in search systems, where it 
would be exceptional to send in a zero-byte file.

For other use cases, though, especially for embedded files, it is kind of 
normal to have zero-byte contents but have meaningful metadata.

So, embedded files generally are one place (as in .ppt, etc.), but WARC 
redirects and HTTPResponse files would be other types of containers that may 
include meaningful metadata in the embedded file, but the embedded file has a 
zero-byte stream. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to