Clemens Marschner wrote:
The way the processing pipelines look like should be configurable by theJust a comment on configuration. My own experience is that is a good idea to get a clean seperation of user focused configuration information from system configuration. Generally speaking its good to get configuration data out of the code and into a configuration source, however, this can become difficult to manage as the size of an application grows. Secondly, system configuration data in XML is often appropriate, however, user centered configuration data is much more likely to be presented in forms and require supplimentary resources for presentation, validation and updating.
user. My first thought was that each processing step will become a component
that can be configured through Avalon's configuration mechanism. I just
don't know if that's right because in Phoenix this is up to the "application
assembler", while in the crawler this may well be a user's task. Components
may be dependent on global services, like a scheduler or a global host
manager.
There are several different approaches to dealing with this and a number of configuration tools in the Excalibur package that will make the job easier. In all cases, seperating these concerns early in the development stage should provide valuable.
If you have a large web crawler it is likely that a crawler gets 100These sort of issues have a direct relationship to the "lifestyle" of the component you use. In Avalon there are a number of "lifestyle" notions such as "pooled", "per-thead", "transient", or "singleton". Within "applications" (the sort of thing managed by Phoneix, the components are typically singletons. In container handling fine grain components (such as the Fortress and Merlin containers) you have the full spectrum of component lifestyle available to you. Providing you focus your component design relative to the framework interfaces (Configurable, Serviceable, Initializable, etc.) you will not have a problem missing and matching the containment solution that best meats the granularity of the problem your dealing with.
docs/second or more, and you end up with about 1500 extracted URLs and 1-2MB
of documents per second. If you have a multi-threaded or multi-process
system it means synchronization becomes an issue, and it is likely that you
have to have several queues between the parts and exchange data in batch
mode (e.g. every couple of seconds).
This is something that is addressed within the Merlin container (excalibur/assembly). It provides support for the packaging of multiple component deployment profiles based on a meta information model (excalibur/meta). As Merlin is deployable within Phoinix as an application or component, you can leverage the more comprehensive manegament functionality that Phoenix offers, while taking advantage of the additional lifestyle and profile features avaialble from excalibut/meta, excalibur/asembly and excalibur/container and excalibur/configuration packages.We thought about using local queues for communication between threads, and JMS queues for communication between processes. In the end we want to provide different configurations for different needs:
Cheers, Steve.
--
Stephen J. McConnell
OSM SARL
digital products for a global economy
mailto:mcconnell@;osm.net
http://www.osm.net
--
To unsubscribe, e-mail: <mailto:avalon-users-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:avalon-users-help@;jakarta.apache.org>