Re: Fw: LARM / Re: Avalonized WebCrawler

Stephen McConnell Fri, 31 Jan 2003 14:47:48 -0800


David Worms wrote:

I forwarded this email to Avalon-users list in the hope they will correct / leverage our discussion. Many times, I am speaking about Merlin without deep knowledge.
1. I wonder how ...crawl.fetcher is working, since there seem to be some
typos:
  - DefaultFetcherTaskFacotry.xinfo (o<->t)
    contains a reference to
      com.celavi.crawl.fetcher.FetcherTaskFacotry
    which doesn't exist
OK, you are right, it took me a while to understand what the .xinfo does before I finally reach the conclusion, nothing. I started to learn fortress with the crawler (used phoenix before) and used the examples present in the fortress CVS, which contains the .xinfo files. But those file are only used by phoenix (auto generated via xdocklet) and merlin.

Just for reference:

There are two variants of the .xinfo file. The original (supported by the Phoneix container and by Merlin) contains a <blockinfo> root element defintion which is basically the defintion of the component type. The second is an .xinfo used by Merlin which contains a <type> root element plus a lot more information than the old block info style. Fortress contains some experimental work related to the <type> defintions but that will probably be moving out as we release Fortress and put more effort into the synchronization of Fortress and Merlin features.

2. Why crawl.Main and crawl.CrawlMain?
Here is the idea:
- crawl.Main is the entry point and a temporary hack. The "service" method in which I manually initialize one component after the other should not be there and will be removed at one point.
- crawl.CrawlMain has the ability to become a component of its own. Or maybe the term "block" is more appropriate. Both Phoenix and Merlin have this concept. A block can export different services. So our CrawlMain block initialize our (inner) crawl container (Merlin or Fortress), and make its most relevant interface visible to a (super) (LARM) container (Merlin, Fortress, or Phoenix).

Some care should be applied to the term block. It was originally used inside Phoenix then drropped then reemerged recently, Basically the notion of a block in Phoenix is a component packaged with meta info (.xinfo container <blockinfo>), deployed using meta data (assembly and confuration files in phoeinix), and deployed such that the components service interfaces are available other blocks that declare dependecies. In Merlin a compoent is the equivilent of a Phenix Block. Whereas a Merlin Block goes much further in that a Block is a object that exposes services on one-hand, and has an implementation model made up of components with a container hierachy. At the end of the day a Merlin Block is a composite component.

3. Do you think dynamically configuring the whole pipeline from a config
file would be possible? The contents of com.celavi.crawl.Main.service()
should come from a config file, say pipeline_xy.xml (more than one pipeline
config should be possible, say crawler_pipeline and indexer_pipeline).
Depending on the contents of this file, another config file should contain
the config values for each component (say crawl_full, crawl_incrementally)

Yes that should be possible. The pipeline I created is a very simple one. It is easy to configure as long as each stage implement a same "MessageListener" interface (with the additional lifecycles). You mention the ability to configure different pipeline. Interesting, I was just looking at this yesterday. I spend some time trying to find out what the hell this "event" excalibur package could bring us. the promise of a SEDA architecture. but I am not sure how that all work. I got a sample code @ http://67.116.155.180/~wdavidw/mySearch-event.zip.
4. What is your rule of thumb what becomes a component and what stays a
class?
To me, a component is an instance that should be instantiated at the application startup and that should be accessible to many other units (components). This is not how I will define a component, but it is the approach I took when I started to refractor a code I didn't understand at the time (and there are still some stuff I am not familiar with).
I did not try to look at your code, take a breath and see how I could decompose the system into components. Instead, I see any object that will be instantiated at the application startup and destroyed at the application shutdown as a candidate.
More or less, everything present in your "FetcherMain" object became a component.

I tend to look at the component/object choice as something more along the lines of "when I want structure and management - use a component", and "when I just want to do it" use an object. Somethimes going total component can be over-the-top and I honestly belive there is a valid gray area where your not actually dealing with pure components, but leaveraging component patterns. I.e. none of this is locked in stone.

5. Why Fortress and not a different container (just curious, I don't have
any preference)?

I learn Avalon with Phoenix first. Great, I love it. Extremely easy to access Phoenix through AltRMI without a change in you code, same to configure your app with JMX. However, what if we want the crawler embedded inside another application. Phoenix can only be run in standalone. Here is were Merlin and Fortress can help.We can have our Fortress based application run from a Main method, inside a servlet, or even better, inside Phoenix as a block. I choose Fortress over Merlin because it is closer from a release.

6. It appears to me that Fortress is creating proxy components that act as
facades to the underlying component interfaces (am I right here?). This is
exactly what I wanted to avoid. It simply becomes too heavy weighted (unless
we use typical component patterns). Since we may well create 100,000
URLMessages per second, it would kill us to send every call to
urlMessageFactory.createURLMessage through a proxy. I wonder if the other
available containers work the same way? (I know Phoenix doesn't do this)

I am not sure about this. Can someone help us? I think we should look at the component handlers (the lifestyle) in Fortress: org.apache.excalibur.fortress.handler package.

In Fortress I think you can declare your own lifecycle hanlder - so you could kill the proxy stuff if you needed to. In Merlin this soert of thing will be declared in the compent <type> defintion and the container will handle it for you. In Phoenix I think you can also control proxy generation.

Cheers, Steve.

--

Stephen J. McConnell
mailto:[EMAIL PROTECTED]
http://www.osm.net

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fw: LARM / Re: Avalonized WebCrawler

Reply via email to