Doug Cutting wrote: >Scott Simpson wrote: >> Suppose I want to run Nutch 0.8 searches on separate machines than I >> crawl on. Is there a way to separate this so my crawling operation >> (MapReduce) doesn't happen on my search machines?
>You could have two different configuration directories and set >HADOOP_CONF_DIR (or use cd). Excuse my ignorance on this issue. Say I have 5 machines in my Hadoop cluster and I only list two of them in the configuration file when I do a "fetch" or a "generate". Won't this just store the data on the two nodes since that is all I've listed for my crawling machines? I'm trying to crawl on two but store my data across all five.
