[ https://issues.apache.org/jira/browse/PDFBOX-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194619#comment-13194619 ]
Timo Boehme commented on PDFBOX-1213: ------------------------------------- I cannot see why the DOCTYPE declaration is a problem. Maybe something is wrong with your SAX parser configuration, e.g. trying to read the DTD? At least it should be made configurable if doctype will be added. In order for easier XML processing afterwards I would propose to change HTML doctype to XHTML. > Adding style information to the PDF to HTML converter > ----------------------------------------------------- > > Key: PDFBOX-1213 > URL: https://issues.apache.org/jira/browse/PDFBOX-1213 > Project: PDFBox > Issue Type: Improvement > Affects Versions: 1.6.0 > Reporter: Enrique Pérez > Attachments: diff.patch > > > This patch modifies the PDF to HTML conversion in order to add style > information (bold, italic and size font) in the resulting file. Moreover, we > have deleted the "DOCTYPE" header because some parsers throws the following > exception: > [Fatal Error] loose.dtd:31:3: The declaration for the entity "HTML.Version" > must end with '>'. > org.xml.sax.SAXParseException: The declaration for the entity "HTML.Version" > must end with '>'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira