Pasted content from word, ( /me shivers ). Word aint the best thing for producing html :) it's word then that uses <font> tag. Maybe word can be configured to output pt instead of px and in, and to produce fixed width tables, or not to nest tables. But I fear for it.

I think your best option is to implement a TagProcessor for font, or add it to the TagFactory so that it maps to something existing. You can't use the default settings for that. see http://demo.itextsupport.com/xmlworker/itextdoc/flatsite.html#itextdoc-menu-7 for an example of how to setup the whole thing yourself. (This documentation is still a work in progress but certainly good enough to get you started )

From

Tags.getHtmlTagProcessorFactory()


You get a DefaultTagProcessorFactory there you can map font to http://api.itextpdf.com/xml/com/itextpdf/tool/xml/html/Span.html a thanks to http://api.itextpdf.com/xml/com/itextpdf/tool/xml/html/DefaultTagProcessorFactory.html#addProcessor(java.lang.String,%20com.itextpdf.tool.xml.html.TagProcessor) this way the text in the font tag will be in the PDF and handled as the span tag. Bear in mind, the attributes from the font tag won't be taken into account, the same counts for some of the attributes from the table tag. ( Check the CSS Support section of the documentation to be sure ) You could write your own TagProcessor and add your created Chunk or Paragraph to the ProcessObject.

Normal output from CKEditor should work, in the demo we used TinyMCE, but we also noticed that HTML pasted from word usually ended up not like you would want. The initial intention of the XMLWorker was not transforming word documents (be it exported to HTML). We didn't think of Word HTML being a good HTML reference :) It's usually not even valid html.

We based our self on the w3c spec of XHTML and bit HTML5 and that of course resulted in a more restrictive framework then internet browsers.

Perhaps you can arrange some of the things by adding your own CSS file where you set certain css values yourself if they are not overridden by css properties added later.


I hope I gave you some helpful directions and ideas


Kind Regards
Balder

On 15/10/2011 16:04, Mark Ramos wrote:
Thanks again Balder, The only challenge we had is that the input of all these html is from a CKEditor to make contents in liferay. Also, by using CKEditor, one of the scenarios/cases is to paste contents from a word document directly to CKEditor. Then when the content is rendered in html we have to export it in PDF. We also tried using flying-saucer which also uses iText 2.0.8 but there are items that are also not rendered properly. I appreciate giving your time to us.

Thank you very much!

Mark

On Sat, Oct 15, 2011 at 9:51 PM, Balder VC <li...@redlab.be <mailto:li...@redlab.be>> wrote:

    Hi,

    A PDF is not a browser, while creating your HTML you should still
    bare in mind that the end result will be a PDF.

    Couple tips:
    It's better to write measures in points (pt). Then no conversion
    is done by the XMLWorker.
    It's a good idea to check the supported tags (in the documentation
    or inside com.itextpdf.tool.xml.html.Tags there are the defaults
    listed).
    The <font> tag, used in the htmlfile, is not supported, that is
    why among others the 'Test' text is not there. You can easily
    write a TagProcessor that does support the font tag as you see fit.
    If I'm correct it is better to define a width for your tables,
    then the XMLWorker does not have to try and fit text in it.
    Nesting tables is possible, but it makes it harder for XMLWorker
    to fit tables on the page.



    Regards
    Balder


    On 14/10/2011 8:33, Mark Ramos wrote:
    Hi,

    Thanks for the links Balder.

    I tried to render the enclosed html file to pdf and I did not get
    a good result. Please check the attachments.

    I used this code snippet:

            Document document = new Document(PageSize.LETTER);
            PdfWriter instance = PdfWriter.getInstance(document, new
    FileOutputStream("/home/mramos/html3.pdf"));
            document.open();
            FileReader br = new
    FileReader("/home/mramos/pdf_cfadmin2.html");
            XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
            worker.parseXHtml(instance, document, br);
            document.close();

    Any help is much appreciated.


    Many thanks!


    -


--
twitter <http://twitter.com/redlabbe>
redlab-log <http://www.redlab.be/blog/>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to