[
https://issues.apache.org/jira/browse/TIKA-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-3627.
-------------------------------
Fix Version/s: 2.2.1
Resolution: Fixed
This was caused by my error when I downgraded POI from 5.x back to 4.x. We
stopped 2.2.1-rc2 because of this and respun 2.2.1-rc3. Thank you for
reporting this.
> OOXML parsing is not working as intended using multiple threads
> ---------------------------------------------------------------
>
> Key: TIKA-3627
> URL: https://issues.apache.org/jira/browse/TIKA-3627
> Project: Tika
> Issue Type: Bug
> Affects Versions: 2.2.0
> Reporter: Bernhard Geisberger
> Priority: Blocker
> Fix For: 2.2.1
>
>
> In the latest version, the parsing of OOXML files is broken if multiple
> threads are used. I investigated and compared the call stack between 2.1.0
> and 2.2.0, and came to the conclusion that this is caused by [this
> commit|https://github.com/apache/tika/commit/10d925439cd862f74679ec5fa9a9b5863f50ce2c]
> in line 86 of OOXMLExtractorFactory.
> In version 2.1.0, the call
> `ExtractorFactory.setThreadPrefersEventExtractors(true)` is used in every
> `parse` call, resulting in setting the thread-local property for every
> thread. In version 2.2.0, the call is used in the static block. This leads to
> the property being the default value (=false) for all other threads than the
> first one. Effectively, this breaks the parsing of macros in OOXML files.
> An easy workaround in version 2.2.0 is to call
> `ExtractorFactory.setAllThreadsPreferEventExtractors(true)` at some time
> before tika is used first.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)