Hi Eric,

   * OpenOffice is used to render MicrosoftOffice and OpenOffice
     attachments into PDF
     OpenOffice badly renders html into PDF
   * iText is used to render XHTML to PDF.
     Like Stefano proposed, render html into XHTML with nu.validator
     (or with jtidy in my case) and then use flying saucer to make a
     PDF out of XHTML
     The flying saucer project internally uses iText 2.x (2.0.8 in my
     case) + iText5.0.6

Benoît

On 06.05.2011 11:54, Eric Charles wrote:
I Benoït,

Many tks for feedback and contribution.

I just downloaded your zip and saw jodconverter (and associated uno..., ju.. jars from openoffice sdk) and itext libs.

You also import jdoconverter and itext classes in PDFConverterJAVA.

What would you advice for any html/text pdf convertion based on your experience?

Tks,
- Eric

On 6/05/2011 10:35, Noss Benoit wrote:
Hello Stefano and all the other who helped me,

I worked with two students on a headless mail renderer (written in JAVA)
I recently opened a project on SourceForge to share this experience
(http://sourceforge.net/projects/mailtopdf/)

Purpose is to render allmost all mails (body + attachments) into one or
more PDFs. Focus was not set on a "sexy" rendition but on a rendition at
all. Mails are read through imap or from a directory, renderer and saved
as PDF in an output directory. It uses OpenOffice and JAI in background
(for the attachments)
I'm quite happy with the first results : it renders 98% of the mails
with their attachments (mean pdf rendition value per mail =300ms on a
normal machine)

Just to let you know it and to thank again


Benoît NOSS




On 25.01.2011 10:30, Stefano Bagnara wrote:
2011/1/25 Noss Benoit<benoit.n...@secu.lu>:
Hi, after your comments, I know think I have to split my project in two
parts

1/ The first part has to parse the message and write an html or xhtml
page
representing the output I want for the message
2/ The second part has to render the html I precedently generated to PDF
I do that in a single step because of the content-id "cid:" image
references.
BTW logically you need to separate components: parser and renderer.

I tried flying saucer in the past, it can generate PDF, but it needed
strict
XHTML for the input, and lots of mails are not strict XHTML
I've had very good results parsing the html with validator.nu parser:
http://about.validator.nu/htmlparser/

I parsed thousands of HTML email and tested most html parser out there
and validator.nu was the only one parsing them all.

On the one hand, I think I can improve my parser to get the html I
want for
most of the mails I have to transform.
On the other hand, I don't know the openoffice SDK, webkit and
Mozilla, and
html rendering will be the hardest part....
If you used flying saucer in past then go ahead with that.

Stefano









Reply via email to