Please do not cross post questions!
Checkout the map reduce branche in the svn. The map reduce will do
all what you are looking for and it works well for me.
Stefan
Am 04.11.2005 um 14:32 schrieb Arsen Popovyan:
At the moment we are using nutch-nightly (nutch-2005-07-20). We are
not pleased with productivity of fetching, parsing, indexing,
analyzing and scoring... information. Now our spider retrieves
approx 25,000 new results per day. All processes now running on one
computer (machine) and we are using local file system. We suppose
that if we want to raise productivity we need to use cluster.
1) Is there any intermediates (storage - ready solutions) for
clusterization Nutch?
2) Tell us please if there was experience of clusterization
Nutch, and what productivity was achieved? And how many computers
were used?
3) We are interested: what tasks we can divide into different
computers and what tasks we can not? And in what way synchronization
of those tasks must be done?
4) Will speed of spiders work increase if we will use
NutchDistributedFileSystem ? What are the advantages and
disadvantages NutchDistributedFileSystem have in using?
5) We were advised to use nutch mapred branch. Should we use it?