Great news, this will push us forward! Will have a look on it immediately (after breakfast, of course ! :-)
Clemens ----- Original Message ----- From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]>; "Avalon framework users" <[EMAIL PROTECTED]> Sent: Tuesday, January 28, 2003 12:55 AM Subject: Re: Avalonized WebCrawler > Oh, no need to swallow any pride - some of us have been meaning to do > this.....when we have more time...hah. > So just a big thank you from us! > > Otis > > > --- Paul Hammant <[EMAIL PROTECTED]> wrote: > > David, > > > > Great work. I sure hope the Lucene peeps can swallow (a little) > > pride > > and merge the best bits. It is always difficult receiving a mountain > > of > > changes... > > > > I look forward to using some of the componentsoutside Lucene, and the > > > > whole thing inside Phoenix when you have it ready :-))) > > > > - Paul H > > (hammant@apache) > > > > > > > > Lucene developers, > > > > > > This mail follow a few threads which took place 2-3 months ago on > > both > > > Lucene and Avalon lists: > > > > > > http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2 > > > http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2 > > > > > > They were related to porting the WebCrawler app into a component > > based > > > application using Avalon. During the past few days, I did just that > > > > > and I will be happy to share the code with the community. There is > > > still a lot to do, but my goal was to contact you once the code > > reach > > > a similar level of development as the one in CVS. I did not contact > > > > > the list before because I wasn't sure were I was going :), and > > because > > > I do not have a CVS access at Apache. > > > > > > You can download the code @ > > http://67.116.155.180/~wdavidw/crawler.zip > > > > > > Both the sources and binaries are present. On my local environment, > > I > > > use Maven as the build system. It isn't included in the dowload > > > because some of the jar I used are recent CVS snapshots not present > > on > > > the Maven remote location( ibiblio.org). If I am not mistaken, all > > the > > > required library are present in the zip file. > > > > > > Overall, the code behave just like the present crawler hosted on > > the > > > Lucene Sandbox repository. Since I mostly did some re-factoring on > > > this code-base, it will be quite easy for the developer(s) to find > > out > > > what happens. All the comments, methods, ...., remains the same. I > > > only changes the most relevant parts. You will find the code > > divided > > > in 2 packages, the original package "de.lanlab.*" and the new one > > > "org.crawl.*". The reason behind this separation is that everytime > > I > > > created a new component, I moved its code into the second package > > for > > > clarity. > > > > > > As the Avalon container, I choose to use Fortress. It is a stable > > and > > > almost released container (a matter of weeks). I am seriously > > thinking > > > about Merlin, but it is no priority for now. > > > > > > Here is a list of the created components/services: > > > > > > fetcher-task-factory > > > host-manager > > > host-resolver > > > url-message-factory > > > web-document-factory > > > message-handler > > > message-listener-selector > > > . url-length-stage > > > . url-scope-stage > > > . robot-exclusion-stage > > > . url-visited-stage > > > . known-path-stage > > > . fetcher-stage > > > storage-pipeline > > > thread-monitor > > > fetcher-thread-factory > > > server-thread-factory > > > url-normalizer > > > url-visited-manager > > > one more to appear: thread-pool-manager > > > > > > Configuration: > > > At this time, every config property is hard coded in the component > > > class. It will be a fast and easy task to integrate the config file > > > > > because the component already implement the Avalon configuration > > > lifecycle. > > > > > > Logging: > > > I had some hard time using fortress logging service. For now, only > > two > > > logger are working, one for the fortress system, the other for the > > > crawler. Once i understand where the logging issues is coming from, > > > > > each component could have his own logger without any code changes. > > > > > > Integration: > > > Fortress can easily be plugged to any time of environment or as a > > > standalone application. I am planning to write a phoenix block > > soon. > > > > > > Client connection: > > > The current Observer service will change completly. Instead of > > > printing informations to the console, it will export some sort of > > > application state descriptor object via AltRMI, or anything else. > > It > > > will be up to the client to render those information. > > > > > > Speed: > > > When running the current code against the Avalonized one, I get > > very > > > similar speed results. The only difference is that it takes somehow > > > > > longer for the new one to reach a stable speed (about 15 secondes). > > > > > > Avalon: > > > I kept having a simplistic use of Avalon. For now, I didn't want to > > > > > use all the tools available. There are few domains were Avalon > > could > > > provide more functionalities: > > > - the lifestyle handler (both in Fortress and Merlin), which could > > > replace the usage of factories for example. > > > - the thread library, because I didn't want to change any of the > > > current code. > > > - the event library, which will reinforce an SEDA architecture. > > > > > > Javadocs: > > > None, I kept the ones present in the past. I will describe every > > > service in more details soon, when I finish with all the > > refactoring. > > > > > > Lucene: > > > I think Lucene should be separated from the crawler. One could > > easily > > > write a service which will schedule crawling process and export the > > > > > results. Then, this service could use those results to > > create/update a > > > Lucene index. > > > > > > Future: > > > I am committed to pursue the development of the crawler. I hope > > many > > > current and future developers will follow me. With your consent, I > > > would likely move this project to SourceForge, but all opinions are > > > > > welcome. > > > > > > David > > > > > > > > > -- > > > To unsubscribe, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > > > > > > -- > > To unsubscribe, e-mail: > > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > > <mailto:[EMAIL PROTECTED]> > > > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > http://mailplus.yahoo.com > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>