[
https://issues.apache.org/jira/browse/TEZ-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213559#comment-16213559
]
Prasanth Jayachandran commented on TEZ-3856:
--------------------------------------------
The problem with the Vertex counters is, it always returns new instance.
Whenever we invoke getCounters() on a vertex, it iterates through all tasks,
aggregates all task counters and returns a snapshot of it but it does not have
counters of it own. There is no counter to capture the vertex's state.
Thinking about it further, this patch itself might not be sufficient as it
updates input counters only *after* vertex termination to full final counters.
The use case I am looking for is, say there are 3 vertices X,Y and Z. Say if X
is reading from 10 files, Y is reading from 100 files and both sending data to
downstream vertex Z. During split generation, hive will know Y is going to read
100 files (assume 1 split = 1 file) and will have to publish custom hive
counter INPUT_FILES which is under say "HIVE" counter group. In the meantime,
Hive's job monitor looks for these INPUT_FILES counter in a fixed interval by
probing DAG counters. If this counter exceeds a threshold (say 50), before (or
immediately after) Y launches 100 tasks to read the input files, Hive will
issue DAG kill.
IMHO vertex seems to be right place as every vertex will have to be initialized
with inputs (be it via split generation or edge).
Any other ideas to achieve the above usecase?
I will update the patch to incrAllCounters(inputCounters) on getCounters() +
non-terminal state.
> API to access counters in InputInitializerContext
> -------------------------------------------------
>
> Key: TEZ-3856
> URL: https://issues.apache.org/jira/browse/TEZ-3856
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.9.1
> Reporter: Prasanth Jayachandran
> Assignee: Prasanth Jayachandran
> Attachments: TEZ-3856.1.patch
>
>
> Hive would like to publish some counters related to input splits during split
> generation. Tez doesn't expose TezCounters via InputIntializerContext. This
> ticket is to expose TezCounters via InputInitializerContext so that counters
> can be accessed during split generation.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)