[ 
https://issues.apache.org/jira/browse/TEZ-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106454#comment-14106454
 ] 

Siddharth Seth commented on TEZ-1476:
-------------------------------------

[~jeagles] - thanks for changing MRRSleep, it was getting annoying trying to 
find an example to easily reproduce this. Finally went with TestLocalMode.

There's lots of stuff broken in the current monitoring code
waitForCompletionWithStatusUpdates is supposed to print vertex specific 
information on each line - instead I'm seeing the following output (missing all 
vertex level information).
{code}
2014-08-21 18:28:05,679 INFO  [main] rpc.DAGClientRPCImpl 
(DAGClientRPCImpl.java:log(428)) - DAG: State: RUNNING Progress: 20.99% 
TotalTasks: 2792 Succeeded: 586 Running: 237 Failed: 0 Killed: 0
2014-08-21 18:28:09,195 INFO  [main] rpc.DAGClientRPCImpl 
(DAGClientRPCImpl.java:log(428)) - DAG: State: RUNNING Progress: 21.02% 
TotalTasks: 2792 Succeeded: 587 Running: 237 Failed: 0 Killed: 0
2014-08-21 18:28:24,774 INFO  [main] rpc.DAGClientRPCImpl 
(DAGClientRPCImpl.java:log(428)) - DAG: State: RUNNING Progress: 21.06% 
TotalTasks: 2792 Succeeded: 588 Running: 237 Failed: 0 Killed: 0
{code}

getDagStatus().getVertexProgress, if invoked, ends up always returning a List. 
Whether this list is populated or not depends upon the state of the AM (DAG 
started, vertices created etc). The null check against the proto is incorrect.
getProgress() has similar problems - it can end up being null depending on AM 
state. If it is null, dagProgress ends up getting updated from -1 to 0, so this 
ends up not changing till a task actually completes, and nothing is logged.

[~jeagles] - the change in the patch to print at a specific interval is 
required. Do you want to address the other issues as part of this jira ? The 
API currently can have different results depending on the status of the AM.

> DAGClient waitForCompletion output is confusing
> -----------------------------------------------
>
>                 Key: TEZ-1476
>                 URL: https://issues.apache.org/jira/browse/TEZ-1476
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Jonathan Eagles
>            Priority: Critical
>         Attachments: TEZ-1476-v1.patch
>
>
> When a DAG is submitted - "2014-08-21 16:38:06,153 INFO  [main] 
> rpc.DAGClientRPCImpl (DAGClientRPCImpl.java:log(428)) - Waiting for DAG to 
> start running" is logged.
> After this, nothing seems to get logged till the first task completes.
> It would be useful to log when the state changes to RUNNING - as well as at 
> least one line stating the number of tasks, etc (0% progress line). Also, 
> progress could be logged every few seconds irrespective of whether it has 
> changed or not to give the impression that the job has not just gotten stuck.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to