Matthew Williams created TIKA-1767:
--------------------------------------

             Summary: Values of .doc dropdowns are not parsed correctly
                 Key: TIKA-1767
                 URL: https://issues.apache.org/jira/browse/TIKA-1767
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.10
         Environment: Windows 8.1
            Reporter: Matthew Williams
            Priority: Minor


I am attempting to parse a word document into XHTML using a 
```ToXMLContentHandler``` that takes in an output stream.

Everything is parsed correctly except dropdowns. Regardless of which option is 
selected, in the XML the output is FORMDROPDOWN.

Interestingly, if I save the document as a pdf (In Microsoft Word) and then use 
the same ```ToXMLContentHandler``` it gets all the information correctly, but 
the format is essentially useless to parse as it is all paragraphs rather than 
maintained in the tables that are found in the original document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to