Matthew Caruana Galizia created TIKA-2450:
---------------------------------------------

             Summary: OfficeParser.parse called for zero-byte file with .doc 
extension
                 Key: TIKA-2450
                 URL: https://issues.apache.org/jira/browse/TIKA-2450
             Project: Tika
          Issue Type: Bug
          Components: detector, parser
    Affects Versions: 1.16
            Reporter: Matthew Caruana Galizia
            Priority: Minor


A zero-byte (empty) file with a .doc extension is detected as a Word Document 
and the {{OfficeParser.parse}} method is called for this file.

We then get a {{TikaException}}, with the cause given as an 
{{org.apache.poi.EmptyFileException}}.

I think it would be more useful if the file were NOT detected as a Word 
Document, meaning that the {{AutoDetectParser}} would then fall back to 
whatever is set as the fallback parser in the parse context.

This is more useful because the user can then trigger some special logic for 
handling empty files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to