Bernhard Geisberger created TIKA-3627: -----------------------------------------
Summary: OOXML parsing is not working as intended using multiple threads Key: TIKA-3627 URL: https://issues.apache.org/jira/browse/TIKA-3627 Project: Tika Issue Type: Bug Affects Versions: 2.2.0 Reporter: Bernhard Geisberger In the latest version, the parsing of OOXML files is broken if multiple threads are used. I investigated and compared the call stack between 2.1.0 and 2.2.0, and came to the conclusion that this is caused by [this commit|https://github.com/apache/tika/commit/10d925439cd862f74679ec5fa9a9b5863f50ce2c] in line 86 of OOXMLExtractorFactory. In version 2.1.0, the call `ExtractorFactory.setThreadPrefersEventExtractors(true)` is used in every `parse` call, resulting in setting the thread-local property for every thread. In version 2.2.0, the call is used in the static block. This leads to the property being the default value (=false) for all other threads than the first one. Effectively, this breaks the parsing of macros in OOXML files. An easy workaround in version 2.2.0 is to call `ExtractorFactory.setAllThreadsPreferEventExtractors(true)` at some time before tika is used first. -- This message was sent by Atlassian Jira (v8.20.1#820001)