Jonas Wilhelmsson created TIKA-1044:
---------------------------------------

             Summary: Can't parse Word files with no format set
                 Key: TIKA-1044
                 URL: https://issues.apache.org/jira/browse/TIKA-1044
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.0
            Reporter: Jonas Wilhelmsson
            Priority: Trivial
         Attachments: test2.doc, test.docx

When we were using Solr for indexing we came over this Tika bug.
While parsing a doc or docx file that contains text without any format set 
(format inside Microsoft Word) the parser will throw exceptions.
By setting a format to the text the file can be correctly parsed without 
unexpected errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to