[ https://issues.apache.org/jira/browse/PDFBOX-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194697#comment-13194697 ]
Timo Boehme commented on PDFBOX-1213: ------------------------------------- In my opinion the proposed changes to PDFTextStripper are too much centered on the use case. I think we need a more general solution here because sometimes more parameters can be extracted from the font definitions. I would propose a fontChanged notification, maybe as a listener pattern because if no listeners are registered we can skip cycles for font information extraction: interface FontChangedListener { public void fontChanged( FontInformation _fInfo ); } class FontInformation { public boolean isBold(); public boolean isItalic(); public boolean isRoman(); public boolean isSansSerif(); public String getFontName(); public float getFontSizePt(); } class PDFTextStripper { ... protected List<FontListener> fontListeners = new LinkedList<FontListener>(); ... public void registeFontListener( FontListener listener ) { fontListeners.add( listener ); } writePage() { ... if ( ! fontListeners.isEmpty() ) { // test for font changes and notify listeners } ... } } In PDFText2HTML you have to keep track if a span was opened with font style information and close it before closing other tags. > Adding style information to the PDF to HTML converter > ----------------------------------------------------- > > Key: PDFBOX-1213 > URL: https://issues.apache.org/jira/browse/PDFBOX-1213 > Project: PDFBox > Issue Type: Improvement > Affects Versions: 1.6.0 > Reporter: Enrique Pérez > Attachments: diff.patch > > > This patch modifies the PDF to HTML conversion in order to add style > information (bold, italic and size font) in the resulting file. Moreover, we > have deleted the "DOCTYPE" header because some parsers throws the following > exception: > [Fatal Error] loose.dtd:31:3: The declaration for the entity "HTML.Version" > must end with '>'. > org.xml.sax.SAXParseException: The declaration for the entity "HTML.Version" > must end with '>'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira