RE: [PROPOSAL] Cocoon Science Fiction

Hochsteger Andreas /INFO-MA Mon, 10 Feb 2003 07:50:31 -0800

Hi Stefano!

Thanks for your comments.
They are very appreciated, although I think there is some misunderstanding
on certain comments.


> -----Ursprüngliche Nachricht-----
> Von: Stefano Mazzocchi [mailto:[EMAIL PROTECTED]]
> Gesendet: Montag, 10. Februar 2003 13:02
> An: [EMAIL PROTECTED]
> Cc: Andreas Hochsteger; Andreas Hochsteger
> Betreff: Re: [PROPOSAL] Cocoon Science Fiction

> Andreas,
> 
> thanks for taking the time for writing this. It is very 
> appreciated. See 
> my personal comments inside. NOTE: they are 'personal' 
> commment and must 
> be treated as such, they never represent the cocoon development 
> community but my personal vision of things.
> 
> [snip]
> 
> > WARNING:
> > I have to say that this proposal is intended for 
> open-minded people only, 
> > which aren't afraid to take a look beyond the limits. 
> 
> I think I can state I'm not afraid to look beyind limits, 
> expecially my 
> own, expecially those I can't see until others point me to. 
> At the same 
> time, I like not to turn of my 'critical mode' while I do so. Please, 
> don't misinterpret this as fear of going forward, but as caution as 
> doing so.

I have no problems with your 'critical mode' comments, since I didn't
assume, that everyone will accept this proposal like it is.
If only one sentence of my proposal caused a little change or enhancement of
Cocoon, then I already succeeded ;-)

> 
> [snip]
> 
> > 3 Introduction
> > ==============
> > 
> > I like the Cocoon pipeline processing concept very much.
> > I like it so much, that I think it is a pitty, to limit it 
> only to XML 
> > processing (although I agree, that this is the most interresting 
> > application).
> 
> These two sentences are antithetical and/or imprecise.
> 
> The Cocoon pipepeline model is different from the more general 
> Pipe&Filters design pattern because it deals with structured data, 
> unlike the P&F which deals with non-structured data.
> 
> The Cocoon pipeline is *not* litterarely limited to XML. It 
> is entirely 
> possible to have not-well-formed XML content flow into the pipeline 
> (even if this is avoided as a general pattern).
> 
> It is correct to say that cocoon pipelines are limited to SAX 
> events and 
> SAX events are a particular kind of structured data.
> 
> With this corrections, you are basically stating that 
> limiting pipelines 
> to a particular type of structured data is limiting.

This is what I wanted to say.

I see, that you are very nit-picking on the terms I use.
I like that, since it's something I do myself all the time.
It's important to use terms as they are meant to be used (and I thought,
that I reviewed it enough often :-(.

> While I understand your concept, I strongly disagree: SAX provides a 
> multidimensional structured data space which is suitable for 
> *any* kind 
> of data structure.

That's interesting.
Do you mean namespaces by multidimensional structured data space?
But I doubt that placing binary or non-XML/SAX Text inside of structured
XML-Tags will solve it all ;-)

> True, maybe not as efficiently as other formats, but removing a fix 
> contract between pipeline components will require a pluggable and 
> metadata-driven parsing/serializatin stage between each component.
> 
> I don't see any value of this compared to the current approach of SAX 
> adaptation of external data to the internal model.

Perhaps you misunderstand something here.
I don't want to change the way, Cocoon handles SAX events right now.
It's more about how we could handle non-SAX data streams a bit better.

> > I'm sure some of you wanted to be able to build 
> applications the same way like 
> > Unix shell pipes work. Cocoon was a big step in this 
> direction, but it was 
> > only applicable for processing XML data. 
> 
> *only XML* is misleading. *based on SAX* is the sentence. I've never 
> perceived this as a limitation, but as a paradigm shift.

Agreed.
But the real world is not SAX-based and some better way to handle non-SAX
data streams is demanded.

> Topologically speaking, the solutions space is rotated, but 
> it's size is 
> not reduced.
> 
> > There are so many cases where 
> > pipeline processing of data (no matter if it is XML, plain 
> text or binary 
> > data) is done today but we are lacking a generic and 
> declarative way to unify 
> > these processing steps. Cocoon is best suited for this task 
> through it's 
> > clean and easy to understand yet powerful pipeline concept.
> 
> If you want to create pipelines for genereral data, why use 
> Cocoon? just 
> use the UNIX pipe or use servlet filters or apache 2.0 modules or any 
> type of 'byte-oriented' (thus un-structured data) 
> pipe&filters modules.

This way I loose the great descriptive concept of Cocoon pipelines and the
integration with it.

> If you remove the structure from the pipeline data that flows, Cocoon 
> will no be Cocoon anymore. This is not evolution, is extintion.

Same misunderstanding as above.
As I pointed out in "11 Converting old sitemaps to new sitemaps" the
components dealing with "/text/xml" are not very different from those
available today.
I don't want to remove the structure from the data through the pipeline in
any way.

> > 4 Pipeline Types
> > ================
> > 
> > I tried to design several pipelines variants but after 
> thinking a while they 
> > all were still too limited for the way I wanted them to work.
> > 
> > So here's another try by giving some hypotheses first:
> > 1. A pipeline can produce data
> > 2. A pipeline can consume data
> > 3. A pipeline can convert data
> > 4. A pipeline can filter data
> > 5. A pipeline can accept a certain data format as input
> > 6. A pipeline can produce a certain data format as output
> > 7. Pipeline components follow the same hypotheses (1-6)
> > 8. Only pipeline components with compatible data formats 
> can be arranged next 
> > to each other
> 
> Ah, here you hint that you don't want to remove data 
> structured-ness in 
> the pipeline, just want to add *other* data structures 
> besides SAX events.

Yes, that's what I want to do...

> Ok, this is worth investigating.

[snip]

> > 5 Data Formats
> > ==============
> > 
> > With "data format" I mean something like XML, plain text, 
> png, mp3, ...
> > I'm not yet really sure here, how we should specify data 
> formats, so I'll try 
> > to start with some requirements:
> > 1. They should be easy to remember and to specify ;-)
> > 2. It should be possible to create derived data formats (-> 
> inheritance)
> > 3. It should be possible to specify additional information 
> (e.g. MIME type, 
> > DTD/Schema for XML, ...)
> > 4. Pipelines which accept a certain data format as input 
> can be fed with 
> > derived data formats
> > 5. We should not reinvent standards, which are already 
> suited for this task 
> > (but I fear, there does not yet exist something suitable)
> 
> You are asking for a very abstract parsing grammar. Note, 
> however, that 
> is pretty easy to point to examples where these grammars will 
> have to be 
> so complex that maintaining them would be a nightmare.

I don't think, that this grammar is very complex.
See "5.1 Data Format Definition".
It only consists of <data:format .../> with optional parameters.

> Think of a BNF-like grammar that is able to explain concepts like XML 
> namespacing or HyTime Architectural Forms.
> 
> > To make it easier for us to begin with the task of defining 
> data formats, 
> > let's assume, we have three basic data formats called 
> "abstract", "binary" 
> > and "text". The format "abstract" will be explained later, 
> but "binary" and 
> > "text" should be clear to everyone.
> 
> Binary and text are unstructured data streams. You are falling back.

We don't fall back, since the structuredness is kept for XML.
We only gain the additional possibility to process unstructured data
streams.

> > 5.1 Data Format Definition
> > --------------------------
> > 

[snip]

> > 5.3 A word about MIME Types
> > ---------------------------
> > 
> > If you ask me, why don't I use the standardized MIME types 
> (see [2]) to 
> > specify data formats, I can give you the following reasons:
> > MIME types fulfill the requirements from above just partly. 
> They just support 
> > two levels of classification and they are purpose-oriented. 
> The data formats 
> > I suggest are therefore content-oriented (/text/xml/svg vs. 
> image/svg-xml). 
> > So both serve different purposes.
> > 
> > I know the importance of supporting the MIME type standard, 
> and so the 
> > parameter 'mime-type' is part of the super data format 
> 'any' and thus is 
> > available for every other data format too. By specifying a 
> certain data 
> > format, you always have a MIME type associated, in the 
> worst case the MIME 
> > type from the super data format 'any' 
> (application/octet-stream) is used.
> 
>  From what I see so far,  you are describing nothing 
> different (from an 
> architectural point of view) from what we already have.

That's not what I wanted to do.

> > 5.4 Data Handlers
> > -----------------
> > 
> > I'm not very sure, what the data handlers actually do, but 
> I can think of 
> > either defining an interface, which must be implemented by 
> the pipeline 
> > components which operate with a certain data format (do we 
> need two handlers 
> > here: input-handler and output-handler?) or they are 
> concrete components 
> > which can be used by the pipeline components to consume or 
> produce this data 
> > format. I think some discussion on this topic might not be bad.
> 
> Here you hit the nerve.
> 
> If you plan on  having a different interface of data-handling 
> for each 
> data-type (or data-type family), the permutation of 
> components will kill 
> you.

Yes, I was aware of this problem.
That's why I'm very interested to hear your comments ;-)

But what I don't mean here is an interface for each data type.
I rather mean to provide a reusable component which knows how to deal with a
certain data format.
This component can be used from other pipeline components.

But I have not thought about it very much yet.

> > 5.5 Data Format Determination
> > -----------------------------
> > 
> > In many cases, I've written the input- and output-format 
> along with the 
> > pipeline components, but it is also possible to specify them in the 
> > <map:components/> section or implicitely by implementing a 
> certain component 
> > interface and therefore omitting it in the pipeline.
> > 
> > Here's a suggested order of data format determination:
> > 
> > 1. Input-/output-Format specified directly with a pipeline component
> >     <map:produce type="uri" ref="docs/file.xml" 
> output-format="/text/xml"/>
> > 2. Input-/output-Format specified by the component declaration
> >     <map:filters>
> >             <map:filter name="prettyxml" input-format="/text/xml" 
> > output-format="/text/xml" ... />
> >     </map:filters>
> > 3. Output-/input-Format specified by the previous or 
> following pipeline 
> > component
> >     <map:produce type="uri" ref="docs/file.xhtml" 
> > output-format="/text/xml/xhtml"/>
> >     <!-- input- and output-format="/text/xml/xhtml" from 
> previous pipeline 
> > component -->
> >     <map:filter type="prettyxml"/>
> > 4. Input-/output-Format specified directly with a pipeline
> >     <map:pipeline input-format="/text/xml" 
> output-format="/text/xml">
> >             <map:filter type="prettyxml"/>
> >             ...
> >     </map:pipeline>
> > 5. If nothing from above matches then assume "none".
> 
> eheh, I wish it was that easy ;-)
> 
> Suppose you have a component that operates on the svg: namespace of a 
> SAX stream only, what is the input type?
> 
> if data types are monodimensional, the above is feasible, but Cocoon 
> pipelines are *already* multi-dimensional and the above can't 
> possibly 
> work (this has been discussed extensively before for pipeline 
> validation)

You got me!
This is something I didn't think about currently.
Perhaps using only "/text/xml" for such cases, without dealing with derived
XML data formats solves it?

> > 6 Pipeline Components
> > =====================
> 
> [snip]
> 
> Assuming you have several structured pipelines:
> 
>   - SAX -> all xml/sgml content
>   - output/input streams -> unstructured text/binary
>   - OLE -> all OLE-based files (word, excel, blah blah)
>   - MPEG -> all MPEG-based framed multimedia (MPEG1/2, mp3)
> 
> why would you want to mix them into the same system?
> 
> I mean, if you want to apply structured-pipeline 
> architectures to, say, 
> audio editing, you are welcome to do so, but why in hell 
> should Cocoon 
> have to deal with this?

Because ...
* it provides a good framework for this tasks
* more and more data processing is done in XML (even publishing, 3D, music,
...)
* it is neccessary to integrate both for migration from legacy data formats
to XML

> You are very close to win the prize for the FS-award of the year :)

Oh, what a privilege ;-)

> It *would* make sense to add these complexities only if processing 
> performed in different realms could be interoperated. But I 
> can't see how.
> 
> what does it mean to perform xstl-transformation on a video stream?
> 
> what does it mean to perform audio mixing on an email?

The 'misuse' you scetched, will be detected through the use of data formats:
* An XSLT-Transformer will only operate on "/text/xml"
* An Audio-Mixer will only operate on "/abstract/sound"

> It would not make any sense to add functionalities inside 
> cocoon that do 
> not belong in the real of its problem space. It would only dilute the 
> effort in the additional complexity only for sake of flexibility.

Cocoon is already used for data integration in may areas.
The possibilities of data itegration should not stop with the Reader
component and converting every legacy data format to XML before processing
it is not always possible.

[snip]

> > 7.1 Web Services
> > ----------------
> > 
> > As many of you know there are existing two popular styles 
> to use Web Services: 
> > SOAP and REST.
> > Both have their own advantages and disadvantages but I'd 
> like to concentrate 
> > on SOAP and on it's transport protocol independence, 
> because REST-style Web 
> > Services are already possible to do with Cocoon.
> > 
> > SOAP allows us to use any transport protocol to deliver 
> SOAP messages. Mostly 
> > HTTP(S) is used therefore, but there are many cases, where 
> you have to use 
> > other protocols (like SMTP, FTP, ...).
> > Whatever protocol you chose to invoke your Web Services the 
> result should be 
> > always the same and the response should be delivered back 
> through (mostly) 
> > the same protocol. Here is one of the greatest advantages 
> of the protocol 
> > independance.
> 
> No, this is not protocol independence. This is transport 
> independance, 
> you are still dependent on SOAP as a protocol.

What I meant was 'transport protocol independence'.

[snip]

> > 8 Protocol Handler
> > ==================
> 
> I don't think Cocoon should implement protocol handlers. Cocoon is a 
> data producer, should not deal with transport.

I agree, that it is not the task of cocoon to deal with transporting.
But Cocoon does this already to a certain degree with the HTTP protocol
(headers!) and is therefore bound to the HTTP protocol.
You can't easily serialize an SVG to a jpeg and deliver it via eMail.

So if I want to be able to deliver the output of a pipeline via different
transport channels I have to break up this tight binding to HTTP.

> We already have enough problems to try to come up with an Enviornment 
> that could work with both email and web (which have orthogonal 
> client/server paradigms), I don't want to further increase the 
> complexity down this road.

I know that this means additional complexity, but currently this complexity
is already hidden in other components (Reader, Serializer) and therefore
mixed with different concerns.
Why should an SVG2JPEG Serializer have to deal with HTTP headers?
I think seperation of concerns is not the case here.

> [snip]
> 
> > 11 Converting old sitemaps to new sitemaps
> > ==========================================
> > 
> > Some of you might be interested, if this new concept is 
> flexible enough to 
> > provide at least the same functionality as Cocoon does today.
> 
> Yes, I agree that the architecture you describe can be seen as an 
> 'extention' of what Cocoon has today, therefore is possible 
> to rewrite 
> current sitemaps in the model you propose.
> 
> yet, I fail to see the advantage of doing so. Since you don't 
> gain any 
> functionality in the problem space where cocoon lives on.

I see great advantages through the reasons I provided above.

[snip]

> You don't say *why* we should do this. What do we gain? why 
> should we do 
> audio/video processing on the server side? why should we introduce 
> components that work on just one pipeline model and can't be 
> shared with 
> others?

The reasons are given above.

> Oh, you definately win my vote for the FS of the year award :)

Thanks ;-)

> -- 
> Stefano Mazzocchi                               <[EMAIL PROTECTED]>
>     Pluralitas non est ponenda sine necessitate [William of Ockham]
> --------------------------------------------------------------------

Bye,
        Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

RE: [PROPOSAL] Cocoon Science Fiction

Reply via email to