Hello Team, I have started to understand about Hadoop Mapreduce and was able to set-up a single cluster single node execution environment.
I want to now extend this to a multi node environment. I have the following questions and it would very helpful if somebody can help: 1. For multiple nodes I understand I should add the URL of the secondary nodes in the slaves.xml. Am I correct? 2. What should be installed on the secondary nodes for executing a job/task? 3. I understand I can set the map/reduce classes as a jar to the Job - through the JobConf - so does this mean I need not really install/copy my map/reduce code on all the secondary nodes? 4. How do I route the data to these nodes? Is it required for the Map Reduce to execute on the machines which has the data stored (DFS)? Any samples for doing this would help. Request for suggestions. Regards Girish Ph: +91-9916212114