[
https://issues.apache.org/jira/browse/TIKA-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-1044.
------------------------------
Resolution: Fixed
Fix Version/s: 1.3
Fixed in r1421646, along with a unit test based on your files, thanks!
> Can't parse Word files with no format set
> -----------------------------------------
>
> Key: TIKA-1044
> URL: https://issues.apache.org/jira/browse/TIKA-1044
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Reporter: Jonas Wilhelmsson
> Priority: Trivial
> Fix For: 1.3
>
> Attachments: test2.doc, test.docx
>
>
> When we were using Solr for indexing we came over this Tika bug.
> While parsing a doc or docx file that contains text without any format set
> (format inside Microsoft Word) the parser will throw exceptions.
> By setting a format to the text the file can be correctly parsed without
> unexpected errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira