[
https://issues.apache.org/jira/browse/TIKA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253511#comment-17253511
]
Nick Burch commented on TIKA-3254:
----------------------------------
Tika tries to give you clean, semantically meaningful XHTML.
It deliberately doesn't give you the "Word -> Save As -> HTML"
fully-featured-mess...
> Html font styles missing - doc to html
> --------------------------------------
>
> Key: TIKA-3254
> URL: https://issues.apache.org/jira/browse/TIKA-3254
> Project: Tika
> Issue Type: Bug
> Reporter: Sathia
> Priority: Major
> Attachments: Sample.doc
>
>
> Hi Team,
> I tried using convert doc to xhtml using tika. the conversation is successful
> but styles missing.
>
> Attached *sample.doc* which I used. the below code I have used for
> conversation.
>
> {{public}} {{String parseToHTML() }}{{throws}} {{IOException, SAXException,
> TikaException {}}
> {{ }}{{ContentHandler handler = }}{{new}} {{ToXMLContentHandler();}}
>
> {{ }}{{AutoDetectParser parser = }}{{new}} {{AutoDetectParser();}}
> {{ }}{{Metadata metadata = }}{{new}} {{Metadata();}}
> {{ }}{{try}} {{(InputStream stream =
> ContentHandlerExample.}}{{class}}{{.getResourceAsStream(}}{{"test.doc"}}{{))
> {}}
> {{ }}{{parser.parse(stream, handler, metadata);}}
> {{ }}{{return}} {{handler.toString();}}
> {{ }}{{}}}
> {{}}}
>
> Regards,
> Sathia
--
This message was sent by Atlassian Jira
(v8.3.4#803005)