Tim Kingsbury created TIKA-2170:
-----------------------------------
Summary: Tika 1.13 ForkParser fails intermittently with very large
MS Word docx
Key: TIKA-2170
URL: https://issues.apache.org/jira/browse/TIKA-2170
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.13
Environment: Windows 10
Reporter: Tim Kingsbury
If the ForkParser is run in a for-loop over and over against a single large
Microsoft Word DOCX file, it fails intermittently. Sometimes it will fail on
the very first iteration. Sometimes it will run through several iterations
before failing. Results are inconsistent.
A small test application is enclosed. For the test, I use a Word docx with the
full text of "War and Peace". 2.8MB, 1141 pages of text.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)