[ https://issues.apache.org/jira/browse/PDFBOX-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504559#comment-13504559 ]
Aaptha commented on PDFBOX-1213: -------------------------------- There is a lot of inconsistency in the marking formatting information for the italics. Sometimes the italics are not marked properly and sometimes the italics tag does not get closed. This inconsistency is often seen in a case where you have a line containing multiple italic words mixed with normal text. What is the strategy for the subscripts and superscripts? > Adding style information to the PDF to HTML converter > ----------------------------------------------------- > > Key: PDFBOX-1213 > URL: https://issues.apache.org/jira/browse/PDFBOX-1213 > Project: PDFBox > Issue Type: Improvement > Affects Versions: 1.6.0 > Reporter: Enrique Pérez > Attachments: diff.patch > > > This patch modifies the PDF to HTML conversion in order to add style > information (bold, italic and size font) in the resulting file. Moreover, we > have deleted the "DOCTYPE" header because some parsers throws the following > exception: > [Fatal Error] loose.dtd:31:3: The declaration for the entity "HTML.Version" > must end with '>'. > org.xml.sax.SAXParseException: The declaration for the entity "HTML.Version" > must end with '>'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira