[
https://issues.apache.org/jira/browse/TEZ-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326802#comment-14326802
]
Rohini Palaniswamy commented on TEZ-2119:
-----------------------------------------
[~sseth],
Getting older :(. You were right.
https://issues.apache.org/jira/browse/TEZ-987 is the one. Probably use this one
for counters and use the other one to implement APIs? I was recently running a
pig script on a very small queue which can run only 76 containers at a time. I
was hoping it would be the same 76 containers reused over and over for the 33K
tasks, but it was launching new containers often. I am wondering if it was
because of data locality. Did not get to reading the AM logs yet as the size is
~350M and was feeling lazy to dig in. Is there something else that can be
added for this? Swimlanes may be useful to get some idea on container reuse.
But I am thinking more in terms of being able to mine later with job stats
populated in hive tables.
> Counter for launched containers
> -------------------------------
>
> Key: TEZ-2119
> URL: https://issues.apache.org/jira/browse/TEZ-2119
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
>
> org.apache.tez.common.counters.DAGCounter
> NUM_SUCCEEDED_TASKS=32976
> TOTAL_LAUNCHED_TASKS=32976
> OTHER_LOCAL_TASKS=2
> DATA_LOCAL_TASKS=9147
> RACK_LOCAL_TASKS=23761
> It would be very nice to have TOTAL_LAUNCHED_CONTAINERS counter added to
> this. The difference between TOTAL_LAUNCHED_CONTAINERS and
> TOTAL_LAUNCHED_TASKS should make it easy to see how much container reuse is
> happening. It is very hard to find out now.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)