Greetings people,

Well, lately, in any Hadoop flow I'm running, I encounter a 3 minutes, 10
second delay for a certain map node (master working as slave). After that
initialization delay, it goes back to normal and executes instantly.

For example, when running QuasiMonteCarlo example:

Task Id Start Time Finish Time
attempt_201204101957_0006_m_000003_0 10/04 20:14:54 10/04 20:18:05 (3mins,
10sec) /default-rack/master

2012-04-10 20:18:04,470 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2012-04-10 20:18:04,646 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2012-04-10 20:18:04,647 WARN org.apache.hadoop.conf.Configuration:
user.nameis deprecated. Instead, use
mapreduce.job.user.name
2012-04-10 20:18:04,751 INFO org.apache.hadoop.mapreduce.util.ProcessTree:
setsid exited with exit code 0
2012-04-10 20:18:04,754 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
org.apache.hadoop.mapreduce.util.LinuxResourceCalculatorPlugin@79ee2c2c
2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: (EQUATOR) 0
kvi 26214396(104857584)
2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask:
mapreduce.task.io.sort.mb: 100
2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: soft limit
at 83886080
2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: bufstart =
0; bufvoid = 104857600
2012-04-10 20:18
:04,912 INFO org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length =
6553600 2012-04-10 20:18:04,939 INFO org.apache.hadoop.mapred.MapTask:
Starting flush of map output
2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: Spilling map
output
2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: bufstart =
0; bufend = 18; bufvoid = 104857600
2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: kvstart =
26214396(104857584); kvend = 26214392(104857568); length = 5/6553600
2012-04-10 20:18:04,972 INFO org.apache.hadoop.mapred.MapTask: Finished
spill 0
2012-04-10 20:18:04,975 INFO org.apache.hadoop.mapred.Task:
Task:attempt_201204101957_0006_m_000003_0 is done. And is in the process of
commiting
2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.Task: Task
'attempt_201204101957_0006_m_000003_0' done.

Task tracker log is more telling:

2012-04-10 *20:14:54,615* INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 1 and trying to launch
attempt_201204101957_0006_m_000003_0 which needs 1 slots
2012-04-10 20:14:54,685 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_201204101957_0006_m_377512887 spawned.
2012-04-10 20:16:34,041 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 1
2012-04-10 *20:18:04,433* INFO org.apache.hadoop.mapred.TaskTracker: JVM
with ID: jvm_201204101957_0006_m_377512887 given task:
attempt_201204101957_0006_m_000003_0
2012-04-10 20:18:04,938 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201204101957_0006_m_000003_0 0.0%
2012-04-10 20:18:05,056 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201204101957_0006_m_000003_0 0.667% Generated 1000 samples.

sort
2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_201204101957_0006_m_000003_0 is done.
2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_201204101957_0006_m_000003_0 was 28
2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2012-04-10 20:18:05,213 INFO org.apache.hadoop.mapreduce.util.ProcessTree:
Sending signal to all members of process group -23030: SIGTERM. Exit code 1
2012-04-10 20:18:08,478 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
28 bytes to reduce 0 from map: attempt_201204101957_0006_m_000003_0 given
28/24
2012-04-10 20:18:08,478 INFO org.apache.hadoop.mapred.TaskTracker: Shuffled
1maps (mapIds=attempt_201204101957_0006_m_000003_0) to reduce 0 in 29s
2012-04-10 20:18:08,478 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 147.102.7.173:50060,
dest: 147.102.7.175:57289, maps: 1, op: MAPRED_SHUFFLE, reduceID: 0,
duration: 29
2012-04-10 20:18:10,217 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201204101957_0006_m_377512887 exited with exit code 0. Number of tasks
it ran: 1

I suspect a network issue here, but I can ping and ssh with no problem.


Thank you in advance,

Nikos Stasinopoulos

Reply via email to