Re: Fw: LARM / Re: Avalonized WebCrawler

David Worms Fri, 31 Jan 2003 13:41:02 -0800

I forwarded this email to Avalon-users list in the hope they will correct / leverage our discussion. Many times, I am speaking about Merlin without deep knowledge.

1. I wonder how ...crawl.fetcher is working, since there seem to be some
typos:
- DefaultFetcherTaskFacotry.xinfo (o<->t)
contains a reference to
com.celavi.crawl.fetcher.FetcherTaskFacotry
which doesn't exist

OK, you are right, it took me a while to understand what the .xinfo does before I finally reach the conclusion, nothing. I started to learn fortress with the crawler (used phoenix before) and used the examples present in the fortress CVS, which contains the .xinfo files. But those file are only used by phoenix (auto generated via xdocklet) and merlin.

2. Why crawl.Main and crawl.CrawlMain?

Here is the idea:
- crawl.Main is the entry point and a temporary hack. The "service" method in which I manually initialize one component after the other should not be there and will be removed at one point.
- crawl.CrawlMain has the ability to become a component of its own. Or maybe the term "block" is more appropriate. Both Phoenix and Merlin have this concept. A block can export different services. So our CrawlMain block initialize our (inner) crawl container (Merlin or Fortress), and make its most relevant interface visible to a (super) (LARM) container (Merlin, Fortress, or Phoenix).

3. Do you think dynamically configuring the whole pipeline from a config
file would be possible? The contents of com.celavi.crawl.Main.service()
should come from a config file, say pipeline_xy.xml (more than one pipeline
config should be possible, say crawler_pipeline and indexer_pipeline).
Depending on the contents of this file, another config file should contain
the config values for each component (say crawl_full, crawl_incrementally)

Yes that should be possible. The pipeline I created is a very simple one. It is easy to configure as long as each stage implement a same "MessageListener" interface (with the additional lifecycles). You mention the ability to configure different pipeline. Interesting, I was just looking at this yesterday. I spend some time trying to find out what the hell this "event" excalibur package could bring us. the promise of a SEDA architecture. but I am not sure how that all work. I got a sample code @ http://67.116.155.180/~wdavidw/mySearch-event.zip.

4. What is your rule of thumb what becomes a component and what stays a
class?

To me, a component is an instance that should be instantiated at the application startup and that should be accessible to many other units (components). This is not how I will define a component, but it is the approach I took when I started to refractor a code I didn't understand at the time (and there are still some stuff I am not familiar with).
I did not try to look at your code, take a breath and see how I could decompose the system into components. Instead, I see any object that will be instantiated at the application startup and destroyed at the application shutdown as a candidate.
More or less, everything present in your "FetcherMain" object became a component.

5. Why Fortress and not a different container (just curious, I don't have
any preference)?

I learn Avalon with Phoenix first. Great, I love it. Extremely easy to access Phoenix through AltRMI without a change in you code, same to configure your app with JMX. However, what if we want the crawler embedded inside another application. Phoenix can only be run in standalone. Here is were Merlin and Fortress can help.We can have our Fortress based application run from a Main method, inside a servlet, or even better, inside Phoenix as a block. I choose Fortress over Merlin because it is closer from a release.

6. It appears to me that Fortress is creating proxy components that act as
facades to the underlying component interfaces (am I right here?). This is
exactly what I wanted to avoid. It simply becomes too heavy weighted (unless
we use typical component patterns). Since we may well create 100,000
URLMessages per second, it would kill us to send every call to
urlMessageFactory.createURLMessage through a proxy. I wonder if the other
available containers work the same way? (I know Phoenix doesn't do this)

I am not sure about this. Can someone help us? I think we should look at the component handlers (the lifestyle) in Fortress: org.apache.excalibur.fortress.handler package.

7. As far as I can see, each MessageProcessor (State/MessageListener in your
terms) adds _itself_ to a message handler that it has to know about (as
defined in DefaultMessageListenerSelector.xinfo). Doesn't this violate the
IoC pattern? Shouldn't an external component initialize the message handler
with the listeners according to a defined order? (the order is at the moment
given only implicitly by the order the config files are processed).

You are right. It is the logical move. First, each stage was registering itself with the MessageHandler. Then I introduce the MessageListenerSelector which instantiate each stage and then register them. Now, MessageHandler should be registering the stages by calling the MessageListenerSelector.selectAll() during its own initialization.

still trying to find out a lot of stuffs... I really learn a lot from your code...

David

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fw: LARM / Re: Avalonized WebCrawler

Reply via email to