On 1/22/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
Finally, web crawling, indexing and searching are data-intensive. Before long, users will want to index tens or hundreds of millions of pages. Distributed operation is soon required at this scale, and batch-mode is an order-of-magnitude faster. So be careful before you threw those features out: you might want them back soon. Doug As a developer building application on top of Nutch, my experience is that
I can't go back to version 0.7x because the features in version 0.8/0.9 are so much needed even for non-distributed crawling/indexing. For example, I can run crawling/indexing on a linux server and a windows laptop separately, and merge newly crawled databases into the main crawldb. I remember v0.7can't merge separate crawldb without lots of customization. It may takes some time to switch from 0.7x to v0.8/0.9 especially if you have lots of customization code. But, once you get over this one hurdle, you will enjoy the new and better features in 0.8/0.9 version. Also, this may be the time to re-think the design of your application. For my own project, I always try to separate my code from nutch core code as much as possible so that I can easily upgrade the application to keep up with new nutch release. Keeping away from the newest nutch version is somewhat backward to me. AJ -- AJ Chen, PhD Palo Alto, CA http://web2express.org