Jonas Wilhelmsson created TIKA-1044:
---------------------------------------
Summary: Can't parse Word files with no format set
Key: TIKA-1044
URL: https://issues.apache.org/jira/browse/TIKA-1044
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.0
Reporter: Jonas Wilhelmsson
Priority: Trivial
Attachments: test2.doc, test.docx
When we were using Solr for indexing we came over this Tika bug.
While parsing a doc or docx file that contains text without any format set
(format inside Microsoft Word) the parser will throw exceptions.
By setting a format to the text the file can be correctly parsed without
unexpected errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira