2011/1/24 Noss Benoit<benoit.n...@secu.lu>:
Hi Stefano,
thanks for your answer. In the past, I already tried to do this
with the
javax.mail.Message class.
it was not a big success..., and found lots of issues due to the
variety of
incoming mails, so couldn't get in production.
You can tweak javamail with some system property to let it parse some
more malformed message.
I say this because I think javamail is ok for this work, too.
Mime4j may be a little simpler, but I'm not sure it worth porting your
code if you already have javamail code ready.
With both you will have anyway to manually deal with mime parts and
decide what to do with each part (mime4j removes the complexity of the
activation framework and automatic object decoding done by javamail).
With each parsed Message, I tried to build in parallel a xhtml page
representing its content (From: To: Subject: Date: and body content)
When the attachement was a message, I recursively went into it and
appended
info found in the xhtml I previously created
When I found html, I tried to transform it to XHTML with tidy, then
to PDF
with iText
when XHTML transformation
failed and had
a multipart/alternative, I then rendered txt to PDF
When I found attached images, I rendered them to PDF
When I found office documents I didn't transform them
After that I merged all created PDF in one big PDF and checked it
in to
Documentum DB (for one message, one pdf)
For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
Flying Saucer).
It is the best pure java xhtml renderer out there: it is not near to
real web browsers but much better than other java rendering I tested.
The aim of the project is not to have a pretty rendering of all
mail, it's
just to keep track of messages our client sent.
I faced three big issues :
**************************
0/ multipart/mixed with inline image content in "cid:...."
Sure, you have to do manual work with this. Look for parts with
Content-ID and alter references in the html urls to link to this
objects.
Depending on your rendering engine you should be able to plug your own
url resolver and intercept cid: urls to provide the streams from the
appropriate mime parts (I do that using Flying Sourcer)
1/ like you said html to pdf rendering is difficult and (tidy+iText or
multipart/alternative) was not always working.
If only I could use the Mozilla components to render it, but my
understanding of it is not high enough
You can use mozilla components or even webkit: just google and you
will find informations. I preferred Flying Saucer because I don't want
to run X (even xvfb) on my servers for this task.
2/ Special caracters and encoding pb in headers and attached file
names
I've had issues only with oriental encodings: they are difficult to
support in flying saucer. No problems with european encodings.
Stefano