[
https://issues.apache.org/jira/browse/TEZ-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121105#comment-14121105
]
Siddharth Seth commented on TEZ-1447:
-------------------------------------
bq. A separate jira is fine, though the feature seems incomplete without adding
it to the VertexManagerPluginContext.
This jira is more for making the InputInitializer aware of state changes, and
not for the entire system. For VertexManagerPlugins - this leads to a slightly
messy API, considering there's APIs like onVertexStarted.
bq. Option 1) register(VertexName) - the listener gets notifications about all
state changes published by the vertex.
Option 2) register(ENUM, VertexName) - the listener registers for a specific
change and gets notified when that happens.
One more reason I like the vertexName API better is that if Tez had the concept
of control connections - to indicate the flow of control information between
vertices and VMs / initializers - a register API would not be required at all.
This, IMO, is a better solution than the current requirement for users to send
the target vertex information as part of the event.
> Provide notification mechanism for user code to know about interesting Vertex
> state changes
> -------------------------------------------------------------------------------------------
>
> Key: TEZ-1447
> URL: https://issues.apache.org/jira/browse/TEZ-1447
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.0
> Reporter: Gunther Hagleitner
> Assignee: Siddharth Seth
> Priority: Blocker
> Attachments: TEZ-1447.1.wip.txt
>
>
> I'm trying to do dynamic partition pruning through input initializer events
> in Hive. That means that the initializer of a table scan vertex has to
> receive events from all tasks in another vertex (which contain the pruning
> info) before generating tasks to run.
> The problem with the current API I ran into:
> getNumTasks: I'm currently using a busy loop to wait for the num tasks for a
> vertex to be decided (-1 -> x). There's no way around it, because it's the
> only way to find out what number of events to expect (0 is a valid number of
> tasks - so I can't wait for the first to complete).
> With auto-reducer parallelism I have to employ another busy loop. Because I
> might be initially expecting 10 events, which later get's knocked down to 5.
> Since there's no event associated with this, I have to periodically check
> whether I have enough events.
> Versioning: Events have a version number, but I don't know which task they
> are coming from. Thus I can't de-dup events.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)