Leo created TIKA-1181:
-------------------------

             Summary: RTFParser not keeping HTML font colors and underscore 
tags.
                 Key: TIKA-1181
                 URL: https://issues.apache.org/jira/browse/TIKA-1181
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.4
         Environment: Windows server 2008
            Reporter: Leo


Hi,

I'm having problems with this code. It does not put the font colors and 
underscores "<u></u>" tags in the HTML from the RTF string. Is there anything I 
can do to put them there? 

Code:
InputStream in = new ByteArrayInputStream(rtfString.getBytes("UTF-8"));  
                   
org.apache.tika.parser.rtf.RTFParser parser = new 
org.apache.tika.parser.rtf.RTFParser();
                                   
Metadata metadata = new Metadata();

StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
                             SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
                    
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
                    
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "no");
handler.setResult(new StreamResult(sw));

parser.parse(in, handler, metadata, new ParseContext());

String xhtml = sw.toString();
                    
xhtml = xhtml.replaceAll("\r\n", "<br>\r\n");

Thanks for looking at it.
Leo



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to