Hey, That is a cool project! Congratulations! I have one where that I am still polishing for release that transforms messages into JSON format and then stores the JSON. My initial benchmarks on non-optimized code is an average of 25,000 messages an hour with the main bottle neck being the IO. Cool to see what other people are doing.
Tony Z On Fri, May 6, 2011 at 3:35 AM, Noss Benoit <benoit.n...@secu.lu> wrote: > Hello Stefano and all the other who helped me, > > I worked with two students on a headless mail renderer (written in JAVA) > I recently opened a project on SourceForge to share this experience > (http://sourceforge.net/projects/mailtopdf/) > > Purpose is to render allmost all mails (body + attachments) into one or more > PDFs. Focus was not set on a "sexy" rendition but on a rendition at all. > Mails are read through imap or from a directory, renderer and saved as PDF > in an output directory. It uses OpenOffice and JAI in background (for the > attachments) > I'm quite happy with the first results : it renders 98% of the mails with > their attachments (mean pdf rendition value per mail =300ms on a normal > machine) > > Just to let you know it and to thank again > > > Benoīt NOSS > > > > > On 25.01.2011 10:30, Stefano Bagnara wrote: >> >> 2011/1/25 Noss Benoit<benoit.n...@secu.lu>: >>> >>> Hi, after your comments, I know think I have to split my project in two >>> parts >>> >>> 1/ The first part has to parse the message and write an html or xhtml >>> page >>> representing the output I want for the message >>> 2/ The second part has to render the html I precedently generated to PDF >> >> I do that in a single step because of the content-id "cid:" image >> references. >> BTW logically you need to separate components: parser and renderer. >> >>> I tried flying saucer in the past, it can generate PDF, but it needed >>> strict >>> XHTML for the input, and lots of mails are not strict XHTML >> >> I've had very good results parsing the html with validator.nu parser: >> http://about.validator.nu/htmlparser/ >> >> I parsed thousands of HTML email and tested most html parser out there >> and validator.nu was the only one parsing them all. >> >>> On the one hand, I think I can improve my parser to get the html I want >>> for >>> most of the mails I have to transform. >>> On the other hand, I don't know the openoffice SDK, webkit and Mozilla, >>> and >>> html rendering will be the hardest part.... >> >> If you used flying saucer in past then go ahead with that. >> >> Stefano >> > > > > > >