Oliver Draese created HIVE-21785:
------------------------------------

             Summary: Add task queue/runtime stats per LLAP daemon to output
                 Key: HIVE-21785
                 URL: https://issues.apache.org/jira/browse/HIVE-21785
             Project: Hive
          Issue Type: Improvement
          Components: llap
    Affects Versions: 3.1.1
            Reporter: Oliver Draese
            Assignee: Oliver Draese
             Fix For: 3.1.1


There are several scenarios, where we want to investigate if a particular LLAP 
daemon is performing faster or slower than the others in the cluster. In these 
scenarios, it is specifically important to figure out if tasks spent 
significant time, waiting for an available executor (queued) vs. on the 
execution itself. Also, a skew in task-to-daemon assignment is interesting.

This patch adds these statistics to the TezCounters and therefore to the job 
output on a per LLAP daemon base. Here is an example.

{{INFO : LlapTaskRuntimeAgg by daemon:}}
{{INFO :    Count-host-1.example.com: 41}}
{{INFO :    Count-host-2.example.com: 39}}
{{INFO :    Count-host-3.example.com: 45}}
{{INFO :    QueueTime-host-1.example.com: 51437776}}
{{INFO :    QueueTime-host-2.example.com: 35758306}}
{{INFO :    QueueTime-host-3.example.com: 47168327}}
{{INFO :    RunTime-host-1.example.com: 165151539295}}
{{INFO :    RunTime-host-2.example.com: 141729193528}}
{{INFO :    RunTime-host-3.example.com: 166876988771}}

The "Count-" are simple task counts for the appended host name (LLAP daemon)

The "QueueTime-" values tell, how long tasks waited in the 
TaskExecutorService's queue before getting actually executed.

The "RunTime-" values cover the time from execution start to finish (where 
finish can either be successful execution or a killed/failed execution).

For the new counts to appear in the output, both - the preexisting 
hive.tez.exec.print.summary and the new hive.llap.task.time.print.summary have 
to be set to true.

 
{{<property>}}
{{  <name>hive.tez.exec.print.summary</name>}}
{{  <value>true</value>}}
{{</property>}}
{{<property>}}
{{  <name>hive.llap.task.time.print.summary</name>}}
{{  <value>true</value>}}
{{</property>}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to