Tomas Safarik created TIKA-1194:
-----------------------------------
Summary: Missing text from MS Word (DOC) file
Key: TIKA-1194
URL: https://issues.apache.org/jira/browse/TIKA-1194
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.4
Reporter: Tomas Safarik
Priority: Critical
Hello,
we noticed that filtered text from some MS Word DOC files is missing one line
(in table) in the original document.
- If you add or remove one character anywhere before the problematic line
filtered text is correct. If you get the text bac to original the filtering
problem is back.
- If the file is resaved as DOCX filtering works fine.
I will provide sample document. And please let me know if more information is
needed.
Regards,
Tomas
--
This message was sent by Atlassian JIRA
(v6.1#6144)