[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340249#comment-17340249
 ] 

Tim Allison edited comment on TIKA-3164 at 5/6/21, 2:56 PM:
------------------------------------------------------------

Reports are here: 
https://corpora.tika.apache.org/base/reports/poi-5.0.1-snapshot-reports.tgz

These compare the latest 4.x vs. 5.0.1-snapshot.  There's a new NPE in WMF 
parsing, and it looks like we're missing a bunch of attachments.

I also need to look into why there's less content coming out of 
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ... 

Parse times seem to be slower for ooxml than in 4.x, but that could be an 
artifact of the mood of the vm at the time of running...

Attachments and content of spreadsheetml could be Tika issues, not POI. I need 
to take a look.



was (Author: [email protected]):
Reports are here: 
https://corpora.tika.apache.org/base/reports/poi-5.0.1-snapshot-reports.tgz

These compare the latest 4.x vs. 5.0.1-snapshot.  There's a new NPE in WMF 
parsing, and it looks like we're missing a bunch of attachments.

I also need to look into why there's less content coming out of 
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ... this 
could be a Tika item, not POI...

Parse times seem to be slower for ooxml than in 4.x, but that could be an 
artifact of the mood of the vm at the time of running...

> Upgrade to POI 5.0.0 when available
> -----------------------------------
>
>                 Key: TIKA-3164
>                 URL: https://issues.apache.org/jira/browse/TIKA-3164
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to