[
https://issues.apache.org/jira/browse/TIKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820139#comment-13820139
]
Tomas Safarik commented on TIKA-1194:
-------------------------------------
I can see the text missing in Apache POI WordToTextConverter output. But I can
see it ok in one variable in debugger. Should I move with this to Apache POI
bug tracker?
> Missing text from MS Word (DOC) file
> ------------------------------------
>
> Key: TIKA-1194
> URL: https://issues.apache.org/jira/browse/TIKA-1194
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.4
> Reporter: Tomas Safarik
> Priority: Critical
>
> Hello,
> we noticed that filtered text from some MS Word DOC files is missing one line
> (in table cell) in the original document.
> - If you add or remove one character anywhere before the problematic
> line/cell then the filtered text is correct. If you get the text back to
> original the filtering problem is back.
> - If the file is resaved as DOCX filtering works fine.
> I will provide sample document. And please let me know if more information is
> needed.
> Regards,
> Tomas
--
This message was sent by Atlassian JIRA
(v6.1#6144)