[
https://issues.apache.org/jira/browse/TEZ-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120576#comment-14120576
]
Siddharth Seth commented on TEZ-1447:
-------------------------------------
bq. I am sorry I dont agree with that. It would not be good to have 2 APIs
where 1 should be enough. Registering for all events would be
register(ENUM.ALL, Vertices) for anyone who needs to listen to all events.
There is no need for register(vertex) and register(enum, vertex) to be separate.
The reason the API doesn't exist is that I don't think registration for
individual event types is very useful. It's easy to ignore unnecessary events -
and gives a better picture in the handler, on what could potentially happen.
On the API itself - still working this out, but as I said, I'm leaning towards
just a notification events on the state change. Advantages of an event is that
additional information can be provided via them, but that can be looked at
later.
bq. I mean v1 changes state that results in I1 being notified which makes V2
change state that causes I2 to be notified. From the patch is seems all of this
will happen on the dispatcher thread after it make v1 transition. What I was
suggesting is that PubSub could send an event to the listener entity (via its
vertex), instead of directly invoking the listener callback when the state
changes. This will decouple all these notifications.
See previous comment. This gets delinked once the InputInitializerManager is
run via threads, which is a separate jira.
bq. Another thing would be to add the same notification APIs on the VMs because
they will also need this.
Separate jira.
bq. FIRE_ONCE_ON_SUCCESS ...
Yes, this will be a separate jira. This one is already big enough. De-linked
from TEZ-1531 as well. Will create a jira for this.
> Handle parallelism updates and versioning w/ custom InputInitializerEvents
> --------------------------------------------------------------------------
>
> Key: TEZ-1447
> URL: https://issues.apache.org/jira/browse/TEZ-1447
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.0
> Reporter: Gunther Hagleitner
> Assignee: Siddharth Seth
> Priority: Blocker
> Attachments: TEZ-1447.1.wip.txt
>
>
> I'm trying to do dynamic partition pruning through input initializer events
> in Hive. That means that the initializer of a table scan vertex has to
> receive events from all tasks in another vertex (which contain the pruning
> info) before generating tasks to run.
> The problem with the current API I ran into:
> getNumTasks: I'm currently using a busy loop to wait for the num tasks for a
> vertex to be decided (-1 -> x). There's no way around it, because it's the
> only way to find out what number of events to expect (0 is a valid number of
> tasks - so I can't wait for the first to complete).
> With auto-reducer parallelism I have to employ another busy loop. Because I
> might be initially expecting 10 events, which later get's knocked down to 5.
> Since there's no event associated with this, I have to periodically check
> whether I have enough events.
> Versioning: Events have a version number, but I don't know which task they
> are coming from. Thus I can't de-dup events.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)