I have been experimenting with that, and when I do, the master saturates
well before the slave nodes, and the jobs start experiencing timeouts
The map task in question is the IdentityMapper, this job is a simple
merge sort, combining data by key where there are duplicate keys in the
input stream.
There is no swapping going on in my cluster, and the machines in
question are all 8 processor boxes, and the tasks.maximum was set to 6.
task_200712261033_0002_m_000078_0: Exception in thread "main"
java.net.SocketTimeoutException: timed out waiting for rpc response
task_200712261033_0002_m_000078_0: at
org.apache.hadoop.ipc.Client.call(Client.java:484)
task_200712261033_0002_m_000078_0: at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
task_200712261033_0002_m_000078_0: at
org.apache.hadoop.mapred.$Proxy0.getTask(Unknown Source)
task_200712261033_0002_m_000078_0: at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1747)
07/12/26 10:48:03 INFO mapred.JobClient: Task Id :
task_200712261033_0002_m_000081_1, Status : FAILED
- Do people put their master node in the slave list - 0.... Jason Venner
-