Teruhiko Kurosaka wrote:
Can I use MapReduce to run Nutch on a multi CPU system?
Yes.
I want to run the index job on two (or four) CPUs on a single system. I'm not trying to distribute the job over multiple systems. If the MapReduce is the way to go, do I just specify config parameters like these: mapred.tasktracker.tasks.maxiumum=2 mapred.job.tracker=localhost:9001 mapred.reduce.tasks=2 (or 1?) and bin/start-all.sh ?
That should work. You'd probably want to set the default number of map tasks to be a multiple of the number of CPUs, and the number of reduce tasks to be exactly the number of cpus.
Don't use start-all.sh, but rather just: bin/nutch-daemon.sh start tasktracker bin/nutch-daemon.sh start jobtracker
Must I use NDFS for MapReduce?
No. Doug