Re: [iText-questions] HTML to PDF with XMLWorker

Balder VC Sat, 15 Oct 2011 10:03:52 -0700

Pasted content from word, ( /me shivers ). Word aint the best thing forproducing html :)it's word then that uses <font> tag. Maybe word can be configured tooutput pt instead of px and in, and to produce fixed width tables, ornot to nest tables. But I fear for it.

I think your best option is to implement a TagProcessor for font, or addit to the TagFactory so that it maps to something existing.You can't use the default settings for that. seehttp://demo.itextsupport.com/xmlworker/itextdoc/flatsite.html#itextdoc-menu-7for an example of how to setup the whole thing yourself. (Thisdocumentation is still a work in progress but certainly good enough toget you started )


From

Tags.getHtmlTagProcessorFactory()

You get a DefaultTagProcessorFactory there you can map font tohttp://api.itextpdf.com/xml/com/itextpdf/tool/xml/html/Span.html athanks tohttp://api.itextpdf.com/xml/com/itextpdf/tool/xml/html/DefaultTagProcessorFactory.html#addProcessor(java.lang.String,%20com.itextpdf.tool.xml.html.TagProcessor)this way the text in the font tag will be in the PDF and handled as thespan tag. Bear in mind, the attributes from the font tag won't be takeninto account, the same counts for some of the attributes from the tabletag. ( Check the CSS Support section of the documentation to be sure )You could write your own TagProcessor and add your created Chunk orParagraph to the ProcessObject.

Normal output from CKEditor should work, in the demo we used TinyMCE,but we also noticed that HTML pasted from word usually ended up not likeyou would want.The initial intention of the XMLWorker was not transforming worddocuments (be it exported to HTML). We didn't think of Word HTML being agood HTML reference :) It's usually not even valid html.

We based our self on the w3c spec of XHTML and bit HTML5 and that ofcourse resulted in a more restrictive framework then internet browsers.

Perhaps you can arrange some of the things by adding your own CSS filewhere you set certain css values yourself if they are not overridden bycss properties added later.



I hope I gave you some helpful directions and ideas


Kind Regards
Balder

On 15/10/2011 16:04, Mark Ramos wrote:

Thanks again Balder, The only challenge we had is that the input ofall these html is from a CKEditor to make contents in liferay. Also,by using CKEditor, one of the scenarios/cases is to paste contentsfrom a word document directly to CKEditor. Then when the content isrendered in html we have to export it in PDF. We also tried usingflying-saucer which also uses iText 2.0.8 but there are items that arealso not rendered properly. I appreciate giving your time to us.


Thank you very much!

Mark

On Sat, Oct 15, 2011 at 9:51 PM, Balder VC <li...@redlab.be<mailto:li...@redlab.be>> wrote:


    Hi,

    A PDF is not a browser, while creating your HTML you should still
    bare in mind that the end result will be a PDF.

    Couple tips:
    It's better to write measures in points (pt). Then no conversion
    is done by the XMLWorker.
    It's a good idea to check the supported tags (in the documentation
    or inside com.itextpdf.tool.xml.html.Tags there are the defaults
    listed).
    The <font> tag, used in the htmlfile, is not supported, that is
    why among others the 'Test' text is not there. You can easily
    write a TagProcessor that does support the font tag as you see fit.
    If I'm correct it is better to define a width for your tables,
    then the XMLWorker does not have to try and fit text in it.
    Nesting tables is possible, but it makes it harder for XMLWorker
    to fit tables on the page.



    Regards
    Balder


    On 14/10/2011 8:33, Mark Ramos wrote:

    Hi,

    Thanks for the links Balder.

    I tried to render the enclosed html file to pdf and I did not get
    a good result. Please check the attachments.

    I used this code snippet:

            Document document = new Document(PageSize.LETTER);
            PdfWriter instance = PdfWriter.getInstance(document, new
    FileOutputStream("/home/mramos/html3.pdf"));
            document.open();
            FileReader br = new
    FileReader("/home/mramos/pdf_cfadmin2.html");
            XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
            worker.parseXHtml(instance, document, br);
            document.close();

    Any help is much appreciated.


    Many thanks!


    -


--
twitter <http://twitter.com/redlabbe>
redlab-log <http://www.redlab.be/blog/>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct

_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] HTML to PDF with XMLWorker

Reply via email to