Jakob Spörk pisze:
> Hello,

Hello Jakob,

> I just want to give my thoughts to unified pipeline and data conversion
> topic. In my opinion, the pipeline can't do the data conversion, because it
> has no information about how to do this. Let's take a simple example: We
> have a pipeline processing XML documents that describe images. The first
> components process this xml data while the rest of the components do
> operations on the actual image. Now is the question, who will transform the
> xml data to image data in the middle of the pipeline? 

I agree with you that pipeline implementation should not handle data conversion 
because there is no generic way to
handle it.

Now I would like to answer your question: it should be another /pipeline 
component/ that handles data conversion.

> I believe the pipeline cannot do this, because it simply do not know how to
> transform, because that’s a custom operation. You would need a component
> that is on the one hand a XML consumer and on the other hand an image
> producer. Providing some automatic data conversions directly in the pipeline
> may help developers that need exactly these default cases but I believe it
> would be harder for people requiring custom data conversions (and that are
> most of the cases).
> 
> The actual architecture allows to fit any components into the pipeline, and
> only the components itself have to know if they can work with their
> predecessor or the component following them. That allow most flexibility
> when thinking about any possible conversions. If a pipeline should do this,
> you would need "plug-ins" for the pipeline that are registered and allow the
> pipeline to do the conversions. But then, it is the responsibility of the
> developer to register the right conversion plug-ins and you would have get
> new problems if a pipeline requires two different converters from the same
> to the same data type because the pipeline cannot have automatically the
> information which converter to use in which situation.

I believe that these problems could be addressed by... compiler. In my opinion, 
pipelines should be type-safe which
basically means that for a given pipeline fragment you know what it expects on 
the input and what kind of output it
gives to you. The same goes for components. This eliminates "flexibility" of 
having a component that accepts more than
one kind of input or more than one kind of output. I believe that having more 
than one output or one input only adds to
complexity and does not solve any problem.

If component was going to accept more than one kind of input how a user could 
know the list of accepted inputs? I guess
the only way to find out would be checking source and looking for all 
"instanceof" statements in its code.

I would prefer situation when components have well-defined type of input and 
output and if you one to combine components
for which input-output pairs do not match you should add converters as 
intermediate components.

I've been thinking about generic but at the same time type-safe pipelines for 
some time. I've designed them on paper and
everything looked quite promising. Then moved to implementation of my ideas and 
got rather disappointing result which
can be seen here:
http://github.com/gkossakowski/cocoonpipelines/tree/master

The most interesting files are:
http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/Pipeline.java
 (generic and
type-safe pipeline interface)

http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/PipelineComponent.java
(generic and type-safe component def.)

http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/demo/RunPipeline.java
(shows how to use that thing)

> The only thing cocoon can help here with is to provide as much "standard"
> converters for use as possible, but it is still the responsibility of the
> developer to use the right ones.

I think Cocoon could define much better, type-safe Pipeline API but we are in 
unfortunate situation that we are using
language that makes it extremely hard to express this kind of generic solutions.

Of course, I would like to be proven that I'm wrong and Java is powerful enough 
to let us express our ideas and solve
our problems. Actually, the whole idea of pipeline is not a rocket science as 
it's, in essence, just ordinary function
composition. The only unique property of pipelines I can see is that we want to 
access to _partial_ results of pipeline
execution so we can make it streamable.

This become more a brain-dump than a real answer to your e-mail Jakob, but I 
hope you (and others) have got my point.

-- 
Best regards,
Grzegorz Kossakowski

Reply via email to