[ 
https://issues.apache.org/jira/browse/TEZ-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120576#comment-14120576
 ] 

Siddharth Seth commented on TEZ-1447:
-------------------------------------

bq. I am sorry I dont agree with that. It would not be good to have 2 APIs 
where 1 should be enough. Registering for all events would be 
register(ENUM.ALL, Vertices) for anyone who needs to listen to all events. 
There is no need for register(vertex) and register(enum, vertex) to be separate.
The reason the API doesn't exist is that I don't think registration for 
individual event types is very useful. It's easy to ignore unnecessary events - 
and gives a better picture in the handler, on what could potentially happen.

On the API itself - still working this out, but as I said, I'm leaning towards 
just a notification events on the state change. Advantages of an event is that 
additional information can be provided via them, but that can be looked at 
later.

bq. I mean v1 changes state that results in I1 being notified which makes V2 
change state that causes I2 to be notified. From the patch is seems all of this 
will happen on the dispatcher thread after it make v1 transition. What I was 
suggesting is that PubSub could send an event to the listener entity (via its 
vertex), instead of directly invoking the listener callback when the state 
changes. This will decouple all these notifications.
See previous comment. This gets delinked once the InputInitializerManager is 
run via threads, which is a separate jira.

bq. Another thing would be to add the same notification APIs on the VMs because 
they will also need this.
Separate jira.

bq. FIRE_ONCE_ON_SUCCESS ... 
Yes, this will be a separate jira. This one is already big enough. De-linked 
from TEZ-1531 as well. Will create a jira for this.


> Handle parallelism updates and versioning w/ custom InputInitializerEvents
> --------------------------------------------------------------------------
>
>                 Key: TEZ-1447
>                 URL: https://issues.apache.org/jira/browse/TEZ-1447
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Gunther Hagleitner
>            Assignee: Siddharth Seth
>            Priority: Blocker
>         Attachments: TEZ-1447.1.wip.txt
>
>
> I'm trying to do dynamic partition pruning through input initializer events 
> in Hive. That means that the initializer of a table scan vertex has to 
> receive events from all tasks in another vertex (which contain the pruning 
> info) before generating tasks to run.
> The problem with the current API I ran into:
> getNumTasks: I'm currently using a busy loop to wait for the num tasks for a 
> vertex to be decided (-1 -> x). There's no way around it, because it's the 
> only way to find out what number of events to expect (0 is a valid number of 
> tasks - so I can't wait for the first to complete).
> With auto-reducer parallelism I have to employ another busy loop. Because I 
> might be initially expecting 10 events, which later get's knocked down to 5. 
> Since there's no event associated with this, I have to periodically check 
> whether I have enough events.
> Versioning: Events have a version number, but I don't know which task they 
> are coming from. Thus I can't de-dup events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to