[
https://issues.apache.org/jira/browse/TIKA-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690573#comment-17690573
]
Hudson commented on TIKA-3976:
------------------------------
SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk8 #1028 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/1028/])
TIKA-3976 (#972) (github:
[https://github.com/apache/tika/commit/e48b10fe917b47cb7660227e558c8be4e15a84dd])
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/AutoDetectParserConfigTest.java
* (edit) CHANGES.txt
* (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
* (edit)
tika-core/src/main/java/org/apache/tika/parser/AutoDetectParserConfig.java
* (edit) tika-core/src/test/java/org/apache/tika/TikaTest.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/configs/tika-config-digests.xml
> Allow users to configure behavior for zero-byte files
> -----------------------------------------------------
>
> Key: TIKA-3976
> URL: https://issues.apache.org/jira/browse/TIKA-3976
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
> Fix For: 2.7.1
>
>
> We currently throw a ZeroByteFileException whenever the stream is empty in
> AutoDetectParser.
> I _think_ the reason we did this was for use cases in search systems, where
> it would be exceptional to send in a zero-byte file.
> For other use cases, though, especially for embedded files, it is kind of
> normal to have zero-byte contents but have meaningful metadata.
> So, embedded files generally are one place (as in .ppt, etc.), but WARC
> redirects and HTTPResponse files would be other types of containers that may
> include meaningful metadata in the embedded file, but the embedded file has a
> zero-byte stream.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)