Am 14.08.2011 14:18, schrieb Sylvain Wallez:
Le 12/08/11 21:08, Thorsten Scherler a écrit :
Hi all,

I am migrating a StAX development from a customer to c3 StAX, since the
resulting code will be much more generic and understandable.

In my case I need to process all files from different folders, parse
them and invoke a second pipeline from the main pipe.

Meaning I have one principal pipeline which I need to repeat x times.
I started to create the pipeline and it works very nice, however I
encounter some downsides with reusing the pipe.

I found that you can execute a java based pipe exactly one time. There
is no such method to reset the pipe. My plan was to inject the pipeline
in my main code and then configure it on the Fly (reusing the same pipe
on different files).

Further there is as well no way to dynamically change the different
components once added to the pipe.

I mean

Pipeline<StAXPipelineComponent>  pipeStAX = new
NonCachingPipeline<StAXPipelineComponent>();
pipeStAX.addComponent(new XMLGenerator(input));
...
pipeStAX.setup(System.out);
pipeStAX.execute();

Now my question is how people feel about:
a) Making java based pipes resettable pipeStAX.reset()
b) Adding a method like pipeStAX.getComonponet(int i) to retrieve the
component x in position i.


a) What exactly should Pipeline.reset() do? (Besides calling reset on each component)
And what should a component do during a reset?
I think components can be configured/set up as often as you like.

b) If you construct the components directly, can't you keep a reference to them and just call the setters/methods directly when needed? I guess I don't understand why the pipeline is not reusable in your case or what you need to reconfigure between the runs.
Maybe you need x different pipelines for x different configurations?


Although reset() can allow pipeline reuse, it won't solve the problem when you have multiple concurrent threads that could benefit from reusing the pipeline.

Cocoon 2.x had component pools to allow reuse in a multithreaded context while avoiding the big cost of reparsing the component's configuration, but this proved to have a significant overhead.

A solution that wouldn't require much changes in the current API would be to require pipelines and pipeline components to be Cloneable, so that you could build a pipeline instance once at startup and then clone it each time you need to use it. That would require component writers to be careful about cloneability though.

Sylvain


Pipelines are not thread-safe!
I think the effort required to make them thread-safe is far too great given the (IMO negligible) benefits. Since everyone can create their own pipeline components there is no way to guarantee that it will work correctly all the time. (I don't think "should work in multi-threaded environments if the component developer didn't make a mistake" should appear in any documentation)

In the case mentioned above (direct Pipeline API calls) component instances are created by the user's code, so the responsibility of doing that efficiently and correctly is the user's and not ours, IMO. Something like a component factory / provider is currently well outside the Pipeline API's responsibilities - actually it's part of the sitemap - and I think it should stay that way. I see the Pipeline API as a small library that provides some helpful classes, which you use in a very controlled and precise manner (like commons-lang, commons-io, etc.) Not like a full execution environment with it's own flow of control (you get that when you use cocoon-servlet with sitemaps).

If you really need/want more efficient construction of components, give this task to someone who specializes in that. Make a Spring context and use prototype beans or even create an object pool, or use some other dependency injection container you like. I don't think we should try to compete with those frameworks on their home field.

Steven

Reply via email to