Re: Headless mail renderer

Noss Benoit Mon, 24 Jan 2011 22:34:38 -0800

Hi, after your comments, I know think I have to split my project in twoparts

1/ The first part has to parse the message and write an html or xhtmlpage representing the output I want for the message

2/ The second part has to render the html I precedently generated to PDF

I tried flying saucer in the past, it can generate PDF, but it neededstrict XHTML for the input, and lots of mails are not strict XHTMLOn the one hand, I think I can improve my parser to get the html I wantfor most of the mails I have to transform.On the other hand, I don't know the openoffice SDK, webkit and Mozilla,and html rendering will be the hardest part....


Thanks,
Benoît

On 24.01.2011 18:16, Eric Charles wrote:

Hi,

fyi

I also used java/mozilla integration via javaxpcom which needsinvestment from developer (API changes,...). An alternative is to usean html to pdf add-on and call it from xul with a java/xulrunnerintegration.

I also used Flying Saucer but didn't know it was able to generate PDF.

For your use case, there's also the openoffice SDK which is reallywell documented and supports a wide range of input/output documentformat (html, pdf,...).


Tks,

Eric


On 24/01/2011 15:09, Noss Benoit wrote:

thanks for your comments Stefano, I will look in the directions yousuggested and keep you informed (if you want to)


Benoît


On 24.01.2011 11:57, Stefano Bagnara wrote:

2011/1/24 Noss Benoit<benoit.n...@secu.lu>:

Hi Stefano,
thanks for your answer. In the past, I already tried to do thiswith the
javax.mail.Message class.
it was not a big success..., and found lots of issues due to thevariety of
incoming mails, so couldn't get in production.

You can tweak javamail with some system property to let it parse some
more malformed message.
I say this because I think javamail is ok for this work, too.
Mime4j may be a little simpler, but I'm not sure it worth porting your
code if you already have javamail code ready.

With both you will have anyway to manually deal with mime parts and
decide what to do with each part (mime4j removes the complexity of the
activation framework and automatic object decoding done by javamail).

With each parsed Message, I tried to build in parallel a xhtml page
representing its content (From: To: Subject: Date: and body content)
When the attachement was a message, I recursively went into it andappended
info found in the xhtml I previously created
When I found html, I tried to transform it to XHTML with tidy, thento PDF
with iText
when XHTML transformationfailed and had
a multipart/alternative, I then rendered txt to PDF
When I found attached images, I rendered them to PDF
When I found office documents I didn't transform them
After that I merged all created PDF in one big PDF and checked itin to
Documentum DB (for one message, one pdf)

For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
Flying Saucer).
It is the best pure java xhtml renderer out there: it is not near to
real web browsers but much better than other java rendering I tested.

The aim of the project is not to have a pretty rendering of allmail, it's
just to keep track of messages our client sent.

I faced three big issues :
**************************
0/ multipart/mixed with inline image content in "cid:...."

Sure, you have to do manual work with this. Look for parts with
Content-ID and alter references in the html urls to link to this
objects.
Depending on your rendering engine you should be able to plug your own
url resolver and intercept cid: urls to provide the streams from the
appropriate mime parts (I do that using Flying Sourcer)

1/ like you said html to pdf rendering is difficult and (tidy+iText or
multipart/alternative) was not always working.
    If only I could use the Mozilla components to render it, but my
understanding of it is not high enough

You can use mozilla components or even webkit: just google and you
will find informations. I preferred Flying Saucer because I don't want
to run X (even xvfb) on my servers for this task.

2/ Special caracters and encoding pb in headers and attached filenames

I've had issues only with oriental encodings: they are difficult to
support in flying saucer. No problems with european encodings.

Stefano

Re: Headless mail renderer

Reply via email to