Nicola Ken Barozzi wrote:


I have some problems I need to solve:

1) I want to use Cocoon as a bean in other programs, as an evolved xml processing bean. I want to create it, set input (stream), set output, and execute.

Problem: I cannot do it with the current Cocoon, without creating my specialized environment to put the input and a Generator that gets from that and generates.

2) I want to execute transformations of mails in a James mailet using Cocoon.

Problem: I have a similar problem to (1).
Cocoon doesn't get input directly from the request, unless from a webapp HttpServletRequest, which is not really feasable here.

3) The StreamGenerator depends on the servlet package, so Cocoon cannot compile without it. So much for blocks and reduced dependencies on core. Cocoon should be able to run in a smaller space, without depending on servlets.

Problem: As we use Cocoon in a container != to a servlet container (Avalon Phoenix, Mod-Cocoon) we loose the StreamGenerator functionality, or portaility of the sitemap, unnecessarily.


So I cannot get a stream from the Request in a standard way, though it's IMHO a reasonably common operation, that should thus be abstracted.

With Vadim we have discussed about it a bit and we found that:

1) not all env. have the need (CLI), but most do
I don't know much about CLI, so I might be completely of track, but I belive that being able to connect the input stream to standard input or to a filem from command line could be usefull for:

* Command line testing of web services.
* Writing tools that convert between different file formats.
* Populates your db from e.g. xml documents.

2) some env.s (mail) have multiple inputs
3) IMHO all env.s have a main input (mail content)
4) it's possible to get these from the Request as attributes instead of as a stream. This makes it more flexible because I can pass objects, and more than one. Standard entries can be added.
I would prefer to put it this way:

* Most (all?) environments (servlet, mail, possibly cli, don't know much about jms) have an input stream.

* In some cases the input stream have multipart content. There are several different sub types of MIME multipart:

- multipart/formdata, used in html forms and xforms
- multipart/mixed, used for email
- multipart/related, used for SOAP over mail. In the working document: SOAP 1.2 Attachment feature from w3c, they even talk about DIME multipart messages as a possible format, (DIME is IIRC, like MIME, but with some kind of part/size table in the beginning so that parts can be extracted without the need of parsing the whole message).
- application/x-www-form-urlencoded, is used in html forms and xforms. It is not a sub type of multipart, but it is used for transmitting key/value pairs.

Most of these multipart formats describe unordered key/value pairs, but multipart/related is a little bit more complicated. It consist of a root document with references to the other parts, and the references can be booth absolute and relative adresses.

* In the current implementation of the servlet environment, the input stream is parsed if it is of type application/x-www-form-urlencoded or multipart/formdata and the key/value pairs are put in the request attributes.

* IMO getting the input stream and parsing its (possibly) multipart content are different concerns. So lets make a getInputStream() method available in all environment, a let it be implemented as mimeMessage.getInputStream() for mail, and request.getInputStream for servlet. The multipart parsing could then be done in some source sub protocols, multipartinput:related://foo/bar.gif, or in specialized modules or maybe in a generator. Of course the current handling of multipart/formdata should be kept in the servlet environment, but I think it is to html specific to be used as a model for all environments.

5) getting stuff from a Request attribute means that I need to parse all the request. This increases memory usage, but sometimes is inevitable, because of the protocol used (mail attachments)
Yes, not much to do about it. The multipart DIME format mentioned above is designed for random acces without having to parse all of the input.

6) getting from a stream can make it easy to make it more efficient on input->output transformations, typical of web services.
Yes, this will most likely be the dominating use case, so lets focus on that. Even if the handling of multipart messages in the general case (i.e. outside the servlet environment or using other sub types than formdata) also might be important it is IMO a different concern and can be handled later when somebody need it.

So, from these, I seem to think that we could

1) add a getInputStream() to the Request
+1.

Today, the Request interface contains the methods getContentLength(), and getContentType(), so that you can ask about the length of and the type of the input stream but not the content of it. Strange IMHO.

Furthermore i think that getInputStream() and the set and get methods for its length and content in the Environment interface. From point 1) in the begining of your mail I guess that you have the same opinion.

2) make other input features available through request attributes
This is more a question about how to implement the interface, as the Request interface allready contain the necessary methods. As I said above, IMO multipart parsing should be a responsablity for sources, modules or generators, and not something that is automatically done by the environment.

In case of mail

1) the mail content goes through getInputStream()
2) the attachements go through Request attributes
getInputStream() returns the mime multipart stream.
- "input://" is connected to a source that also returns the mime multipart stream.
- "multipartinput:mixed://" is a TraversableSource and makes it possible to list the content of the multipart.
- "multipartinput:mixed://1" is the stream of the first attachment.

/Daniel Fagerstrom



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to