[jira] [Commented] (TIKA-3030) XLS files with a root node named WORKBOOK don't get parsed

Tim Allison (Jira) Tue, 28 Jan 2020 14:28:23 -0800


    [ 
https://issues.apache.org/jira/browse/TIKA-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025472#comment-17025472
 ]


Tim Allison commented on TIKA-3030:
-----------------------------------

Thank you for raising this issue.  As you point out, the fix is easy.  Are you 
able to share the triggering file with us?  I'd be curious to see if it was 
actually created by Microsoft or by another tool...perhaps something in the 
metadata?  This is pure curiosity and nothing to do with the fix.

> XLS files with a root node named WORKBOOK don't get parsed
> ----------------------------------------------------------
>
>                 Key: TIKA-3030
>                 URL: https://issues.apache.org/jira/browse/TIKA-3030
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.23
>            Reporter: Clark Perkins
>            Priority: Major
>
> I have an XLS file where the root node contains 2 top-level names - 
> "WORKBOOK" and " SummaryInformation".
> The type gets properly detected as "application/vnd.ms-excel", because the 
> POIFSContainerDetector does a check like so:
> {noformat}
> if (names.contains("Workbook") || names.contains("WORKBOOK")) {
>     ...
> }{noformat}
> However, the ExcelExtractor silently rejects the file because the root node 
> doesn't contain a top level node named "Workbook".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TIKA-3030) XLS files with a root node named WORKBOOK don't get parsed

Reply via email to