Tim Kingsbury created TIKA-2170:
-----------------------------------

             Summary: Tika 1.13 ForkParser fails intermittently with very large 
MS Word docx
                 Key: TIKA-2170
                 URL: https://issues.apache.org/jira/browse/TIKA-2170
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.13
         Environment: Windows 10
            Reporter: Tim Kingsbury


If the ForkParser is run in a for-loop over and over against a single large 
Microsoft Word DOCX file, it fails intermittently. Sometimes it will fail on 
the very first iteration. Sometimes it will run through several iterations 
before failing. Results are inconsistent. 

A small test application is enclosed. For the test, I use a Word docx with the 
full text of "War and Peace". 2.8MB, 1141 pages of text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to