Leo created TIKA-1181:
-------------------------
Summary: RTFParser not keeping HTML font colors and underscore
tags.
Key: TIKA-1181
URL: https://issues.apache.org/jira/browse/TIKA-1181
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.4
Environment: Windows server 2008
Reporter: Leo
Hi,
I'm having problems with this code. It does not put the font colors and
underscores "<u></u>" tags in the HTML from the RTF string. Is there anything I
can do to put them there?
Code:
InputStream in = new ByteArrayInputStream(rtfString.getBytes("UTF-8"));
org.apache.tika.parser.rtf.RTFParser parser = new
org.apache.tika.parser.rtf.RTFParser();
Metadata metadata = new Metadata();
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "no");
handler.setResult(new StreamResult(sw));
parser.parse(in, handler, metadata, new ParseContext());
String xhtml = sw.toString();
xhtml = xhtml.replaceAll("\r\n", "<br>\r\n");
Thanks for looking at it.
Leo
--
This message was sent by Atlassian JIRA
(v6.1#6144)