[
https://issues.apache.org/jira/browse/TIKA-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2104:
------------------------------
Attachment: newExceptionsInBDetails.xlsx
newExceptionsInBByMimeTypeByStackTrace.xlsx
I ran our batch code against ~800k MSOffice files without swallowing exceptions
from Macro extraction. I'm attaching the results. We can use these to
identify and prioritize fixing exceptions.
> Upgrade to a version of POI that fixes common bugs in macro extraction, when
> available
> --------------------------------------------------------------------------------------
>
> Key: TIKA-2104
> URL: https://issues.apache.org/jira/browse/TIKA-2104
> Project: Tika
> Issue Type: Bug
> Reporter: Tim Allison
> Attachments: newExceptionsInBByMimeTypeByStackTrace.xlsx,
> newExceptionsInBDetails.xlsx
>
>
> On TIKA-2069, we found two bugs in POI that prevented the extraction of
> macros from MSOffice files. Let's use this issue to track fixes in POI.
> Current known bugs are POI:
> 60162
> 60158
> 59830
> 59858
> After we release Tika 1.14, let's remove the catch blocks in Tika and rerun
> against our regression corpus to help identify the most common bugs and find
> new ones.
> As always, patches are welcome on POI!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)