Matthew Williams created TIKA-1767:
--------------------------------------
Summary: Values of .doc dropdowns are not parsed correctly
Key: TIKA-1767
URL: https://issues.apache.org/jira/browse/TIKA-1767
Project: Tika
Issue Type: Bug
Affects Versions: 1.10
Environment: Windows 8.1
Reporter: Matthew Williams
Priority: Minor
I am attempting to parse a word document into XHTML using a
```ToXMLContentHandler``` that takes in an output stream.
Everything is parsed correctly except dropdowns. Regardless of which option is
selected, in the XML the output is FORMDROPDOWN.
Interestingly, if I save the document as a pdf (In Microsoft Word) and then use
the same ```ToXMLContentHandler``` it gets all the information correctly, but
the format is essentially useless to parse as it is all paragraphs rather than
maintained in the tables that are found in the original document.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)