[ 
https://issues.apache.org/jira/browse/TIKA-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2104:
------------------------------
    Attachment: newExceptionsInBDetails.xlsx
                newExceptionsInBByMimeTypeByStackTrace.xlsx

I ran our batch code against ~800k MSOffice files without swallowing exceptions 
from Macro extraction.  I'm attaching the results.  We can use these to 
identify and prioritize fixing exceptions.

> Upgrade to a version of POI that fixes common bugs in macro extraction, when 
> available
> --------------------------------------------------------------------------------------
>
>                 Key: TIKA-2104
>                 URL: https://issues.apache.org/jira/browse/TIKA-2104
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>         Attachments: newExceptionsInBByMimeTypeByStackTrace.xlsx, 
> newExceptionsInBDetails.xlsx
>
>
> On TIKA-2069, we found two bugs in POI that prevented the extraction of 
> macros from MSOffice files.  Let's use this issue to track fixes in POI.
> Current known bugs are POI:
> 60162
> 60158
> 59830
> 59858
> After we release Tika 1.14, let's remove the catch blocks in Tika and rerun 
> against our regression corpus to help identify the most common bugs and find 
> new ones.
> As always, patches are welcome on POI!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to