[ 
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147874#comment-16147874
 ] 

Luis Filipe Nassif commented on TIKA-2450:
------------------------------------------

Late to the party... In forensic field, it is very useful to know that some 
recovered corrupted zero byte file named "sex with 10 yo child.jpg" was a 
picture in the past.

Currently we test if the file is zero length before sending it to 
autodetectparser to not get confusing parser errors, but that is not possible 
with streams...

So +1 to a unique ZeroByteFileException.

> OfficeParser.parse called for zero-byte file with .doc extension
> ----------------------------------------------------------------
>
>                 Key: TIKA-2450
>                 URL: https://issues.apache.org/jira/browse/TIKA-2450
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, parser
>    Affects Versions: 1.16
>            Reporter: Matthew Caruana Galizia
>            Priority: Minor
>             Fix For: 1.17
>
>
> A zero-byte (empty) file with a .doc extension is detected as a Word Document 
> and the {{OfficeParser.parse}} method is called for this file.
> We then get a {{TikaException}}, with the cause given as an 
> {{org.apache.poi.EmptyFileException}}.
> I think it would be more useful if the file were NOT detected as a Word 
> Document, meaning that the {{AutoDetectParser}} would then fall back to 
> whatever is set as the fallback parser in the parse context.
> This is more useful because the user can then trigger some special logic for 
> handling empty files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to