Hi Justin
Just a note to keep you up to date on where I'm at, and maybe provoke some response:
I've been reading more and more, and thinking and drawing architectural diagrams ... it's more complicated than I first thought. I think this is an issue for your project too.
Currently I'm still thinking that the JavaMail access should be a Source, not implementing XMLizable, i.e. it would provide messages in the "message/rfc822" format. So that hasn't changed.
The parsing to XML should probably be done by either an XMLizer, or a Generator. Or possibly, another Source, layered on top of the MIME source. This is the issue that's exercising me at the moment. The fact is that a MIME message is not a simple data type: it is in fact a kind of file-system, with other files and directories in it (the MIME-parts).
The big thing which I'd been neglecting was how to provide access to these MIME-parts (i.e. a PART of a message) to the relevant components of Cocoon. For instance, a MIME message may contain a part which is in "text/html" format, and this html may refer (with an IMG tag) to an image which is in another part of the MIME message, with an "image/gif" mime-type for instance. To render this doc to a web-browser as html, or as PDF or whatever, it will be necessary for the Cocoon pipeline to extract this gif image from within the message, and feed it to the browser or Batik, as required. Concretely, the web app will have to generate a web page containing an IMG with href = "some" url which Cocoon can then use to find the gif image from inside that particular message.
Much the same applies to MIME-parts which contain message attachments of some arbitrary mime-type ("application/octet-stream" is a good one) which Cocoon can't do anything useful with, but which a browser might understand, or at least download as files.
I have to do this because my list archive needs to handle attachments.
It seems to me I've got 2 main options:
1) XMLize everything
Currently I'm tending towards using an XMLizer which will convert a "message/rfc822" byte-stream into SAX events (possibly using the XMSG schema rather than the XMTP schema I've used before: http://www.w3.org/TR/xmsg/ - I'm not sure about this yet)
This XMLizer would handle all the MIME-parts (even non-xml parts would be returned as "lumps" of data) and these could therefore be handled using the various XML-processing mechanisms: Xinclude, XSLT, etc, etc, even without necessarily being able to process their actual contents. So a GIF image MIME-part would appear as a <data> element in the SAX stream: http://www.w3.org/TR/xmsg/#N632 containing some text-encoded GIF data (i.e. Base64 encoded). For a binary mime-part, Cocoon processing would be limited to kind of "routing" it through to the browser, without transforming it on the way. To use this technique, we'd also need a MIMEPartSerializer which would decode this part into a binary stream, for return to a browser.
Of course, MIME-parts of XML would be parsed fully, and mime-parts of HTML would be converted to XHTML with JTidy.
Using this approach, to refer to a MIME-part in the sitemap, you would generate the full message, then extract the part using a transformer for instance. There'd be no need to encode everything into the source url used in the sitemap, and this keeps the Source simple (means we can use a FileSource to read emails from individual files, too).
2) Handle non-XML parts in their native format
I'm not so clear on how this one would work, but I haven't yet ruled it out entirely ... I still need to get it clear in my head.
We'd need some component that would return a MIME-part from within a message, in a native format. It seems to me that it will need to implement Source (as far as Cocoon is concerned, this is the interface for reading a non-XML resource). But it must be able to get the MIME-part from either a file or url or from some kind of JavaMail source. So it would be a Source layered on top of some other source (AFAIK this would be a unique pattern in Cocoon, but not unreasonable given the nature of MIME-messages).
I'm not sure if this is really a runner: the MIME-parts contain more data than a Source could provide. For instance, a Content-Disposition. This is one reason why I'm inclining towards the XMLizer approach. The trouble is that Cocoon is set up for doing magic with XML, but non-XML data is either converted to XML or else just passed through with a Reader. There's no facility for "pipelines" of non-XML data.
Whew! I've got another busy day today for another client and may not get anything done on it, but over the weekend I'll spend some more time on it and hopefully begin some actual programming work.
I've also been trying to define a URL scheme for referring to JavaMail resources. This also relates to the "cid:" and "mid:" schemes which are used for hyperlinks within a given MIME message, though as I said, I'd prefer to leave this out of the javamail or pop or whatever URLs and deal with it inside the Cocoon pipeline.
It also relates to how to represent the contents of a JavaMail FOLDER to Cocoon: whether directly as XML or with mime-type "Multipart/report" http://www.ohse.de/uwe/rfc/rfc1892.html or "Multipart/digest" http://deesse.univ-lemans.fr:8003/Connected/RFC/1521/19.html which can then be XMLized with a Cocoon XMLizer. See http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html for details of multipart formats.
Anyway ... I'm off now to have some lunch and then I have to visit a client. I hope your work is going ok.
Con