[
https://issues.apache.org/jira/browse/TIKA-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574355#comment-14574355
]
Tim Allison edited comment on TIKA-1651 at 6/5/15 11:43 AM:
------------------------------------------------------------
One example file and a group by on a somewhat reduced version of the stack
traces. Looks like they might roughly boil down to the same issue.
>From the few I've opened, these are xls charts, not full xls files, embedded
>in ppt.
was (Author: [email protected]):
One example file and a group by on a somewhat reduced version of the stack
traces. Looks like they might boil down to the same issue.
> Excel files embedded in ppt and xls seem to have a high rate of exceptions in
> govdocs1
> --------------------------------------------------------------------------------------
>
> Key: TIKA-1651
> URL: https://issues.apache.org/jira/browse/TIKA-1651
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Attachments: 428996.ppt, embedded_xls_stack_traces.csv
>
>
> I haven't had a chance to look into this at all, but I wanted to open an
> issue to track this. With recently modified tika eval dev code that captures
> exceptions from embedded documents, there are ~30k exceptions in govdocs1 for
> xls files embedded in ppt and xls files.
> There's a chance that something went wrong with the eval code, and there's a
> chance that these files are mis-typed, but we should take a look.
> Example files to follow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)