Tim Allison created TIKA-1651:
---------------------------------

             Summary: Excel files embedded in ppt and xls seem to have a high 
rate of exceptions in govdocs1
                 Key: TIKA-1651
                 URL: https://issues.apache.org/jira/browse/TIKA-1651
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


I haven't had a chance to look into this at all, but I wanted to open an issue 
to track this.  With recently modified tika eval dev code that captures 
exceptions from embedded documents, there are ~30k exceptions in govdocs1 for 
xls files embedded in ppt and xls files. 

There's a chance that something went wrong with the eval code, and there's a 
chance that these files are mis-typed, but we should take a look.

Example files to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to