2011/1/25 Noss Benoit <benoit.n...@secu.lu>: > Hi, after your comments, I know think I have to split my project in two > parts > > 1/ The first part has to parse the message and write an html or xhtml page > representing the output I want for the message > 2/ The second part has to render the html I precedently generated to PDF
I do that in a single step because of the content-id "cid:" image references. BTW logically you need to separate components: parser and renderer. > I tried flying saucer in the past, it can generate PDF, but it needed strict > XHTML for the input, and lots of mails are not strict XHTML I've had very good results parsing the html with validator.nu parser: http://about.validator.nu/htmlparser/ I parsed thousands of HTML email and tested most html parser out there and validator.nu was the only one parsing them all. > On the one hand, I think I can improve my parser to get the html I want for > most of the mails I have to transform. > On the other hand, I don't know the openoffice SDK, webkit and Mozilla, and > html rendering will be the hardest part.... If you used flying saucer in past then go ahead with that. Stefano