Stuart Hendren created TIKA-2347:
------------------------------------

             Summary: Underlined text is not decorated as such when extracting 
from word documents
                 Key: TIKA-2347
                 URL: https://issues.apache.org/jira/browse/TIKA-2347
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.14, 2.0
            Reporter: Stuart Hendren


When extracting from doc and docx bold and italic text decoration is extracted, 
however underlining is not.  Can be demonstrated in WordParserTest or 
OOXMLParserTest (change to docx) with the following test case.

{code:title=WordParserTest.java|borderStyle=solid}
    @Test
    public void testTextDecoration() throws Exception {
      XMLResult result = getXML("testWORD_various.doc");
      String xml = result.xml;

      assertTrue(xml.contains("<b>Bold</b>"));
      assertTrue(xml.contains("<i>italic</i>"));
      assertTrue(xml.contains("<u>underline</u>"));

    }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to