Hi, after your comments, I know think I have to split my project in two parts

1/ The first part has to parse the message and write an html or xhtml page representing the output I want for the message
2/ The second part has to render the html I precedently generated to PDF

I tried flying saucer in the past, it can generate PDF, but it needed strict XHTML for the input, and lots of mails are not strict XHTML On the one hand, I think I can improve my parser to get the html I want for most of the mails I have to transform. On the other hand, I don't know the openoffice SDK, webkit and Mozilla, and html rendering will be the hardest part....

Thanks,
Benoît

On 24.01.2011 18:16, Eric Charles wrote:
Hi,

fyi
I also used java/mozilla integration via javaxpcom which needs investment from developer (API changes,...). An alternative is to use an html to pdf add-on and call it from xul with a java/xulrunner integration.
I also used Flying Saucer but didn't know it was able to generate PDF.
For your use case, there's also the openoffice SDK which is really well documented and supports a wide range of input/output document format (html, pdf,...).

Tks,

Eric


On 24/01/2011 15:09, Noss Benoit wrote:
thanks for your comments Stefano, I will look in the directions you suggested and keep you informed (if you want to)

Benoît


On 24.01.2011 11:57, Stefano Bagnara wrote:
2011/1/24 Noss Benoit<benoit.n...@secu.lu>:
Hi Stefano,
thanks for your answer. In the past, I already tried to do this with the
javax.mail.Message class.
it was not a big success..., and found lots of issues due to the variety of
incoming mails, so couldn't get in production.
You can tweak javamail with some system property to let it parse some
more malformed message.
I say this because I think javamail is ok for this work, too.
Mime4j may be a little simpler, but I'm not sure it worth porting your
code if you already have javamail code ready.

With both you will have anyway to manually deal with mime parts and
decide what to do with each part (mime4j removes the complexity of the
activation framework and automatic object decoding done by javamail).

With each parsed Message, I tried to build in parallel a xhtml page
representing its content (From: To: Subject: Date: and body content)
When the attachement was a message, I recursively went into it and appended
info found in the xhtml I previously created
When I found html, I tried to transform it to XHTML with tidy, then to PDF
with iText
when XHTML transformation failed and had
a multipart/alternative, I then rendered txt to PDF
When I found attached images, I rendered them to PDF
When I found office documents I didn't transform them
After that I merged all created PDF in one big PDF and checked it in to
Documentum DB (for one message, one pdf)
For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
Flying Saucer).
It is the best pure java xhtml renderer out there: it is not near to
real web browsers but much better than other java rendering I tested.

The aim of the project is not to have a pretty rendering of all mail, it's
just to keep track of messages our client sent.

I faced three big issues :
**************************
0/ multipart/mixed with inline image content in "cid:...."
Sure, you have to do manual work with this. Look for parts with
Content-ID and alter references in the html urls to link to this
objects.
Depending on your rendering engine you should be able to plug your own
url resolver and intercept cid: urls to provide the streams from the
appropriate mime parts (I do that using Flying Sourcer)

1/ like you said html to pdf rendering is difficult and (tidy+iText or
multipart/alternative) was not always working.
    If only I could use the Mozilla components to render it, but my
understanding of it is not high enough
You can use mozilla components or even webkit: just google and you
will find informations. I preferred Flying Saucer because I don't want
to run X (even xvfb) on my servers for this task.

2/ Special caracters and encoding pb in headers and attached file names
I've had issues only with oriental encodings: they are difficult to
support in flying saucer. No problems with european encodings.

Stefano











Reply via email to