Sean Story created TIKA-2179:
--------------------------------
Summary: WordMLParser fails to parse a word xml file
Key: TIKA-2179
URL: https://issues.apache.org/jira/browse/TIKA-2179
Project: Tika
Issue Type: Bug
Affects Versions: 1.14
Environment: OSX, java 8
Reporter: Sean Story
Priority: Minor
h3. Problem
I have a sample word.xml file that can be parsed by neither OOXMLParser (yields
an exception that was {{Caused by:
org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException: The supplied
data appears to be a raw XML file. Formats such as Office 2003 XML are not
supported}}) nor by OfficeParser (yields an exception like:
{{org.apache.poi.poifs.filesystem.NotOLE2FileException: The supplied data
appears to be a raw XML file. Formats such as Office 2003 XML are not
supported}}
I found TIKA-1958 which mentioned the new WordMLParser, so downloaded the
source, built, and updated my tika version to 1.14. However, when parsing with
WordMLParser, the output text content I get is the empty string {{""}}, but I'm
expecting something more like:
{noformat}
It means that the guy that you are trading with was reported for a scam
attempt. As the others mentioned, some of these BOFA could be false.
What's important is the current trade that you are doing.
If everything seems to be in order then there is nothing wrong with going
through with the trade.
Auti, Sneha (QAPM)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)