Tomas Safarik created TIKA-1194:
-----------------------------------

             Summary: Missing text from MS Word (DOC) file
                 Key: TIKA-1194
                 URL: https://issues.apache.org/jira/browse/TIKA-1194
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.4
            Reporter: Tomas Safarik
            Priority: Critical


Hello,

we noticed that filtered text from some MS Word DOC files is missing one line 
(in table) in the original document.

- If you add or remove one character anywhere before the problematic line 
filtered text is correct. If you get the text bac to original the filtering 
problem is back.
- If the file is resaved as DOCX filtering works fine.

I will provide sample document. And please let me know if more information is 
needed.

Regards,

Tomas



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to