Oh, no need to swallow any pride - some of us have been meaning to do this.....when we have more time...hah. So just a big thank you from us!
Otis --- Paul Hammant <[EMAIL PROTECTED]> wrote: > David, > > Great work. I sure hope the Lucene peeps can swallow (a little) > pride > and merge the best bits. It is always difficult receiving a mountain > of > changes... > > I look forward to using some of the componentsoutside Lucene, and the > > whole thing inside Phoenix when you have it ready :-))) > > - Paul H > (hammant@apache) > > > > > Lucene developers, > > > > This mail follow a few threads which took place 2-3 months ago on > both > > Lucene and Avalon lists: > > > > http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2 > > http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2 > > > > They were related to porting the WebCrawler app into a component > based > > application using Avalon. During the past few days, I did just that > > > and I will be happy to share the code with the community. There is > > still a lot to do, but my goal was to contact you once the code > reach > > a similar level of development as the one in CVS. I did not contact > > > the list before because I wasn't sure were I was going :), and > because > > I do not have a CVS access at Apache. > > > > You can download the code @ > http://67.116.155.180/~wdavidw/crawler.zip > > > > Both the sources and binaries are present. On my local environment, > I > > use Maven as the build system. It isn't included in the dowload > > because some of the jar I used are recent CVS snapshots not present > on > > the Maven remote location( ibiblio.org). If I am not mistaken, all > the > > required library are present in the zip file. > > > > Overall, the code behave just like the present crawler hosted on > the > > Lucene Sandbox repository. Since I mostly did some re-factoring on > > this code-base, it will be quite easy for the developer(s) to find > out > > what happens. All the comments, methods, ...., remains the same. I > > only changes the most relevant parts. You will find the code > divided > > in 2 packages, the original package "de.lanlab.*" and the new one > > "org.crawl.*". The reason behind this separation is that everytime > I > > created a new component, I moved its code into the second package > for > > clarity. > > > > As the Avalon container, I choose to use Fortress. It is a stable > and > > almost released container (a matter of weeks). I am seriously > thinking > > about Merlin, but it is no priority for now. > > > > Here is a list of the created components/services: > > > > fetcher-task-factory > > host-manager > > host-resolver > > url-message-factory > > web-document-factory > > message-handler > > message-listener-selector > > . url-length-stage > > . url-scope-stage > > . robot-exclusion-stage > > . url-visited-stage > > . known-path-stage > > . fetcher-stage > > storage-pipeline > > thread-monitor > > fetcher-thread-factory > > server-thread-factory > > url-normalizer > > url-visited-manager > > one more to appear: thread-pool-manager > > > > Configuration: > > At this time, every config property is hard coded in the component > > class. It will be a fast and easy task to integrate the config file > > > because the component already implement the Avalon configuration > > lifecycle. > > > > Logging: > > I had some hard time using fortress logging service. For now, only > two > > logger are working, one for the fortress system, the other for the > > crawler. Once i understand where the logging issues is coming from, > > > each component could have his own logger without any code changes. > > > > Integration: > > Fortress can easily be plugged to any time of environment or as a > > standalone application. I am planning to write a phoenix block > soon. > > > > Client connection: > > The current Observer service will change completly. Instead of > > printing informations to the console, it will export some sort of > > application state descriptor object via AltRMI, or anything else. > It > > will be up to the client to render those information. > > > > Speed: > > When running the current code against the Avalonized one, I get > very > > similar speed results. The only difference is that it takes somehow > > > longer for the new one to reach a stable speed (about 15 secondes). > > > > Avalon: > > I kept having a simplistic use of Avalon. For now, I didn't want to > > > use all the tools available. There are few domains were Avalon > could > > provide more functionalities: > > - the lifestyle handler (both in Fortress and Merlin), which could > > replace the usage of factories for example. > > - the thread library, because I didn't want to change any of the > > current code. > > - the event library, which will reinforce an SEDA architecture. > > > > Javadocs: > > None, I kept the ones present in the past. I will describe every > > service in more details soon, when I finish with all the > refactoring. > > > > Lucene: > > I think Lucene should be separated from the crawler. One could > easily > > write a service which will schedule crawling process and export the > > > results. Then, this service could use those results to > create/update a > > Lucene index. > > > > Future: > > I am committed to pursue the development of the crawler. I hope > many > > current and future developers will follow me. With your consent, I > > would likely move this project to SourceForge, but all opinions are > > > welcome. > > > > David > > > > > > -- > > To unsubscribe, e-mail: > > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>