Matthew Caruana Galizia created TIKA-2450:
---------------------------------------------
Summary: OfficeParser.parse called for zero-byte file with .doc
extension
Key: TIKA-2450
URL: https://issues.apache.org/jira/browse/TIKA-2450
Project: Tika
Issue Type: Bug
Components: detector, parser
Affects Versions: 1.16
Reporter: Matthew Caruana Galizia
Priority: Minor
A zero-byte (empty) file with a .doc extension is detected as a Word Document
and the {{OfficeParser.parse}} method is called for this file.
We then get a {{TikaException}}, with the cause given as an
{{org.apache.poi.EmptyFileException}}.
I think it would be more useful if the file were NOT detected as a Word
Document, meaning that the {{AutoDetectParser}} would then fall back to
whatever is set as the fallback parser in the parse context.
This is more useful because the user can then trigger some special logic for
handling empty files.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)