On 8/18/14, 1:26 AM, Tsuyoshi OZAWA wrote:
Hi,
That is not the correct behaviour for the normal runs.
The 10 minute gap sounds a lot like AM recovery is kicking in.
This will happen if the application master is killed half-way through a
job and it recovers work from the HDFS recovery logs.
The most common case for me is when the AM container size and java opts
are not in sync.
Can you check if that's happening in cases with only just a single AM
attempt?
Cheers,
Gopal
I'm trying some jobs in examples of tez. I found that sometimes jobs
get hung-up on my distributed environment.
Because of this behavior, the jobs on tez get slow down and sometimes
get slower than original MapReduce jobs(e.g. if I run
OrderedWordCount, WordCount + Sort in MapReduce is faster than
OrderedWordCount on Tez sometimes). Is this correct behavior? Do you
know how can we solve it or tune Tez program? I attached logs at the
time as follows:
<--- log1 - job start up time -->
14/08/18 16:02:36 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING
Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
14/08/18 16:02:36 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0
Killed: 0
14/08/18 16:02:36 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Tokenizer Progress: 0% TotalTasks: -1 Succeeded: 0 Running: 0 Failed:
0 Killed: 0
14/08/18 16:02:36 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Summation Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0
Killed: 0
-- next print is 10 minutes later --
14/08/18 16:15:57 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING
Progress: 1.04% TotalTasks: 96 Succeeded: 1 Running: 52 Failed: 0
Killed: 0
14/08/18 16:15:57 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0
Killed: 0
14/08/18 16:15:57 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Tokenizer Progress: 3.37% TotalTasks: 89 Succeeded: 3 Running: 52
Failed: 0 Killed: 0
14/08/18 16:15:57 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Summation Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
<--- log1 -->
<--- log2 -->
14/08/18 16:35:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING
Progress: 92.71% TotalTasks: 96 Succeeded: 89 Running: 6 Failed: 0
Killed: 0
14/08/18 16:35:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Tokenizer Progress: 100% TotalTasks: 89 Succeeded: 89 Running: 0
Failed: 0 Killed: 0
14/08/18 12:19:25 INFO rpc.DAGClientRPCImpl: VertexStatus:
VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded:
0 Running: 0 Failed: 0 Killed: 0
14/08/18 16:35:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Summation Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 6 Failed: 0
Killed: 0
--- next print is 10 minutes later --
14/08/18 16:53:28 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING
Progress: 93.75% TotalTasks: 96 Succeeded: 90 Running: 5 Failed: 0
Killed: 0
14/08/18 16:53:28 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0
Killed: 0
14/08/18 16:53:28 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Tokenizer Progress: 100% TotalTasks: 89 Succeeded: 89 Running: 0
Failed: 0 Killed: 0
14/08/18 16:53:28 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
Summation Progress: 16.67% TotalTasks: 6 Succeeded: 1 Running: 5
Failed: 0 Killed: 0
<--- log2 -->
Thanks,
- Tsuyoshi