Aleksandr Dubinsky created TIKA-2606: ----------------------------------------
Summary: Tika.parseToString of particular docx results in duplicate text Key: TIKA-2606 URL: https://issues.apache.org/jira/browse/TIKA-2606 Project: Tika Issue Type: Bug Affects Versions: 1.17 Reporter: Aleksandr Dubinsky Attachments: TalkingTeachingInquiryforInnovation.docx Attached is a file that is not parsed correctly. Text is duplicated when read with Tika.parseToString. In the output, the text of the document appears first, then a corrupted copy of the document, then another copy of the document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)