Scott Simpson wrote:
Excuse my ignorance on this issue. Say I have 5 machines in my Hadoop
cluster and I only list two of them in the configuration file when I do a
fetch or a generate. Won't this just store the data on the two nodes
since that is all I've listed for my crawling machines? I'm
Doug Cutting wrote:
Scott Simpson wrote:
Suppose I want to run Nutch 0.8 searches on separate machines than I
crawl on. Is there a way to separate this so my crawling operation
(MapReduce) doesn't happen on my search machines?
You could have two different configuration directories and set