[ 
https://issues.apache.org/jira/browse/TIKA-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574362#comment-14574362
 ] 

Tim Allison edited comment on TIKA-1651 at 6/5/15 11:47 AM:
------------------------------------------------------------

This is the xls file that is extracted from 428996.ppt.  Excel can't open it 
either...'Can't open Microsoft Graph chart gallery files'.  However, a quick 
look via hex editor shows what looks like good ole objects and clear plain text 
from the image that could be extracted....


was (Author: [email protected]):
This is the xls file that is extracted from 428996.ppt.  Excel can't open it 
either...'Can't open Microsoft Graph chart gallery files'

> Excel files embedded in ppt and xls seem to have a high rate of exceptions in 
> govdocs1
> --------------------------------------------------------------------------------------
>
>                 Key: TIKA-1651
>                 URL: https://issues.apache.org/jira/browse/TIKA-1651
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>         Attachments: 11.xls, 428996.ppt, embedded_xls_stack_traces.csv
>
>
> I haven't had a chance to look into this at all, but I wanted to open an 
> issue to track this.  With recently modified tika eval dev code that captures 
> exceptions from embedded documents, there are ~30k exceptions in govdocs1 for 
> xls files embedded in ppt and xls files. 
> There's a chance that something went wrong with the eval code, and there's a 
> chance that these files are mis-typed, but we should take a look.
> Example files to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to