Stuart Hendren created TIKA-2347:
------------------------------------
Summary: Underlined text is not decorated as such when extracting
from word documents
Key: TIKA-2347
URL: https://issues.apache.org/jira/browse/TIKA-2347
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.14, 2.0
Reporter: Stuart Hendren
When extracting from doc and docx bold and italic text decoration is extracted,
however underlining is not. Can be demonstrated in WordParserTest or
OOXMLParserTest (change to docx) with the following test case.
{code:title=WordParserTest.java|borderStyle=solid}
@Test
public void testTextDecoration() throws Exception {
XMLResult result = getXML("testWORD_various.doc");
String xml = result.xml;
assertTrue(xml.contains("<b>Bold</b>"));
assertTrue(xml.contains("<i>italic</i>"));
assertTrue(xml.contains("<u>underline</u>"));
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)