[ 
https://issues.apache.org/jira/browse/TIKA-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690528#comment-17690528
 ] 

ASF GitHub Bot commented on TIKA-3976:
--------------------------------------

tballison merged PR #972:
URL: https://github.com/apache/tika/pull/972




> Allow users to configure behavior for zero-byte files
> -----------------------------------------------------
>
>                 Key: TIKA-3976
>                 URL: https://issues.apache.org/jira/browse/TIKA-3976
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> We currently throw a ZeroByteFileException whenever the stream is empty in 
> AutoDetectParser.
> I _think_ the reason we did this was for use cases in search systems, where 
> it would be exceptional to send in a zero-byte file.
> For other use cases, though, especially for embedded files, it is kind of 
> normal to have zero-byte contents but have meaningful metadata.
> So, embedded files generally are one place (as in .ppt, etc.), but WARC 
> redirects and HTTPResponse files would be other types of containers that may 
> include meaningful metadata in the embedded file, but the embedded file has a 
> zero-byte stream. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to