[ https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147303#comment-16147303 ]
Matthew Caruana Galizia commented on TIKA-2450: ----------------------------------------------- I would argue that the raison d'etre of tika-detect is not to provide extension-based detection, but to provide detection. A zero-bye file can never be a Word Document, so assuming my first statement is true then logically it should not be detected as a Word Document. > OfficeParser.parse called for zero-byte file with .doc extension > ---------------------------------------------------------------- > > Key: TIKA-2450 > URL: https://issues.apache.org/jira/browse/TIKA-2450 > Project: Tika > Issue Type: Bug > Components: detector, parser > Affects Versions: 1.16 > Reporter: Matthew Caruana Galizia > Priority: Minor > > A zero-byte (empty) file with a .doc extension is detected as a Word Document > and the {{OfficeParser.parse}} method is called for this file. > We then get a {{TikaException}}, with the cause given as an > {{org.apache.poi.EmptyFileException}}. > I think it would be more useful if the file were NOT detected as a Word > Document, meaning that the {{AutoDetectParser}} would then fall back to > whatever is set as the fallback parser in the parse context. > This is more useful because the user can then trigger some special logic for > handling empty files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)