Hello Stefano and all the other who helped me,

I worked with two students on a headless mail renderer (written in JAVA)
I recently opened a project on SourceForge to share this experience (http://sourceforge.net/projects/mailtopdf/)

Purpose is to render allmost all mails (body + attachments) into one or more PDFs. Focus was not set on a "sexy" rendition but on a rendition at all. Mails are read through imap or from a directory, renderer and saved as PDF in an output directory. It uses OpenOffice and JAI in background (for the attachments) I'm quite happy with the first results : it renders 98% of the mails with their attachments (mean pdf rendition value per mail =300ms on a normal machine)

Just to let you know it and to thank again


BenoƮt NOSS




On 25.01.2011 10:30, Stefano Bagnara wrote:
2011/1/25 Noss Benoit<benoit.n...@secu.lu>:
Hi, after your comments, I know think I have to split my project in two
parts

1/ The first part has to parse the message and write an html or xhtml page
representing the output I want for the message
2/ The second part has to render the html I precedently generated to PDF
I do that in a single step because of the content-id "cid:" image references.
BTW logically you need to separate components: parser and renderer.

I tried flying saucer in the past, it can generate PDF, but it needed strict
XHTML for the input, and lots of mails are not strict XHTML
I've had very good results parsing the html with validator.nu parser:
http://about.validator.nu/htmlparser/

I parsed thousands of HTML email and tested most html parser out there
and validator.nu was the only one parsing them all.

On the one hand, I think I can improve my parser to get the html I want for
most of the mails I have to transform.
On the other hand, I don't know the openoffice SDK, webkit and Mozilla, and
html rendering will be the hardest part....
If you used flying saucer in past then go ahead with that.

Stefano






Reply via email to