Re: data goes in, data goes out

Stefano Mazzocchi Thu, 29 Nov 2001 10:27:33 -0800

Gianugo Rabellino wrote:

> I have a basic concern here. URLs are, as the name suggests *locators*
> or *identifiers*. The idea is that via a URL you can locate (identify)
> data and fetch them: they were not designed to handle the opposite case
> where you have to send data to them. The HTTP POST is a workaround which
> is HTTP specific and goes way beyon the URL concept: there is no way to
> express in the URL syntax the *direction* of the data flow. And if you
> can't tell, looking at a URI, if it's "read" or "write" you will end up
> with troubles using it in an intermixed way.


Great analysis. In fact, I believe that the sense of "outward biasing"
of the Cocoon internal pipelines if a reflection of this lack of
direction information for URIs.

Now, I believe that URI should *NOT* have any direction information
because it's up to another concern islands to come up with this (like
HTTP does).

For example, I was impressed by the elegance of the first servlet I saw
that used the toGet() method to generate the form, the doPost() method
to process it and a doError() method to generate the form with error
indications (called by the doPost() method directly).

[I want this elegance to be percepted from the statemap as well!]

> What can be done, of course, is to use the URL to lookup a resource and
> operate on the result (getting an OutputStream or an XmlConsumer to
> write or send events to). This is easy for existing resources. But what
> happens when you get ResourceNotFoundException? Should you pass the
> error or just create a new (empty) resource  with the name given as the
> URI? I think that this is an arbitrary decision that has nothing to do
> with the URL concept, and this kind of scares me.

Ok, let's go top-down on my wish list [:

1) indicate on what "resource" we want to work on. Note: "resource" is a
much more neutral term than "source" or "destination" since it doesn't
convey a meaning of flow direction, but just a "location", an
identifier. Here, the URI is simply perfect, but should be used *only*
to indicate the resource. Placing behavioral information overlaps
concerns.

  URI uri = new URI("protocol://host/path/resource");
  Resource resource = ResourceDiscovery.getResource(uri);

2) indicate what we want to do with this resource. Note: if a resource
is behaviorally-neutral, we must specify the behavior we want to
interact with this resource. Here, the concept of HTTP actions is the
example.

  resource.setAction(Resource.WRITE);

3) obtain the required connectors.

  resource.getOutputStream();
or 
  resource.getContentHandler();

                           - o -

The above augments the java.net design patterns with explicit behavioral
additions. The problem is that both step 2 and 3 are behavior-dependant,
for example

  URI uri = new URI("cvs://cvs.apache.org/xml-cocoon/README");
  Resource resource = ResourceDiscovery.getResource(uri);
  resource.setAction(Resource.READ);
  ((CVSResource) resource).fromBranch("xml-cocoon2");
  outputHandler = resource.getContentHandler();

but this requires casting. The following does not

  URI uri = new
URI("cvs://cvs.apache.org/xml-cocoon/README?fromBranch='xml-cocoon2'");
  Resource resource = ResourceDiscovery.getResource(uri);
  resource.setAction(Resource.READ);
  ContentHandler outputHandler = resource.getContentHandler();

but could generate IllegalStateExceptions if a ContentHandler is *set*
on a READ action. In fact, the behavior is automatically assumed by the
call to the connector (since the connector *does* convey direction
information).

  URI uri = new
URI("cvs://cvs.apache.org/xml-cocoon/README?fromBranch='xml-cocoon2'");
  Resource resource = ResourceDiscovery.getResource(uri);
  ContentHandler outputHandler = resource.getContentHandler();

Is this enough?

The uniform syntax of URI allows for completely transparent polymorphic
behavior (or, at least, it seems so). There are URI-based descriptors
for a bunch of protocol interfaces (IMAP, POP, addressbook, file, ftp,
http, etc..). 

Of course, there are protocol handlers that *must* throw exceptions if
some behaviors are not implementable. for example, the following should
throw an exception:

  URI uri = new URI("smtp://mail.myhost.com/");
  Resource resource = ResourceDiscovery.getResource(uri);
  InputStream is = resource.getInputStream();  <--- throws exception!

because you can't read from an smtp resource.

Anyway, the Resource interface should have:

 interface Resource {

        // Writable connectors
        OutputStream getOutputStream() throws InvalidMethodException;
        Writer getWriter() throws InvalidMethodException;
        ContentHandler getContentHandler() throws InvalidMethodException;

        // Writable connectors
        InputStream getInputStream() throws InvalidMethodException;
        Reader getReader() throws InvalidMethodException;
        void setContentHandler(ContentHandler) throws InvalidMethodException;

 }

We could come up with some Monitorizable interface to add monitoring
capabilities that connect to the cache. 

The URI schemes I find useful for Cocoon are:

 1) file: -> obviously
 2) dbxml: -> obvious again
 3) http: -> reading is done thru GET, writing thru PUT or POST
 4) webdav: -> [should this be different from HTTP?]
 5) cvs:  -> would allow Cocoon to generate the documentation directly
out of CVS. A plus when you don't have much local storage capacity (say
on diskless embedded system, but maybe this is FS)
 6) ftp: -> nobody uses FTP nowadays, but legacy systems do.
 7) imap: -> would be killer for cocoon-based webmail application (I
know Gianugo was thinking about implementing this) but probably a direct
javamail interface would be much more useful.
 8) smtp: -> would allow us to serialize a pipeline on email. Might be
useful or might be FS, see above.

what I don't really find useful are:

 1) sql -> how do you map a table with a path?
 2) ldap -> yeah, the tree-like directory structure appears appealing,
but how would you save an XML file into an LDAP tree? fragment the
entire document into nodes and store those? bah, don't find it very
relevant

cocoon specific stuff:

 1) resource: -> gets stuff from the current classpath [might not be
that useful once we have the others below, but it doesn't hurt to have
it. Obviously, writing methods are illegal]
 2) cocoon: -> get stuff from Cocoon-served space. It would be killer to
have both internal reading (as for content aggregation) and writing (as
for content dissassembly [the opposite of aggregation]: storing
different namespaces on different locations abstracting from the way
they are implemented as pipelines)

I like the power that a bidirectional 'cocoon:' protocol would give us:
as content can ge aggregated from different sources, even internal ones
(layered I/O is a feature that took years for the Apache 2.0 project to
implement and still they pass unstructured byte streams between modules,
making this virtually useless for SAX-based component pipelines), we
could now have a way to "disassemble" content on different locations.
[that might require a new sitemap semantic, as <map:disassemble>, but
that would be backcompatible since it's an addition]

Anyway, as reading from a cocoon: protocol identified resource allows
for more solid contracts to be defined (creating a level of indirection
that can be used to change underlying implementations without having to
change the rest), writing would be equivalently powerful.

For example, 

  URI uri = new URI("cocoon://storage/" + relative_path);
  Resource resource = ResourceDiscovery.getResource(uri);
  setContentHandler(this.handler);

would allow to write information in a logical location, completely
detached from the physical implementation of the storing phase. Then, we
could have an internal-only sitemap associated with the "/storage" URI.

Now, this appears as a cool concept but we have an impedence mismatch
that rings a bell:

 1) for outward flow we have

      g1 -> t1 -> t2 -> s
            ^
            |
            t3
            ^
            |
            g2

   where the internal serializer is removed because useless.

 b) for inward flow we would have

     g -> t1 -> t2 -> s1
          |
          v
          t3 -> store
          |
          v
          s2
                      
where the second generator is removed because useless.

Ok, but what is the second serializer doing?

we could reshape it as

     g -> t1 -> t2 -> s1
          |
          v
          t3
          |
          v
          s2

where is the "serializer" to actually perform the storage.

This has an interesting result on 'regular' pipelines: shouldn't

  data in -> g => t1 => t2 => s => data out
                        |^
                        v|
                       store

where '->' is a bynary stream and '=>' a SAX stream

be reshaped as

  data in -> g1 => t1 => store -> g2 => t2 => s -> data out

where "store" is now a full blown serializer?

This would allow the pipelines to be more "symmetrical".

Bah, anyway, it turned out to be an RT.

Let's see what you think about this.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: data goes in, data goes out

Reply via email to