[ 
https://issues.apache.org/jira/browse/TEZ-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101304#comment-14101304
 ] 

Bikas Saha commented on TEZ-1447:
---------------------------------

Some more thoughts on the pub sub idea.
1) The state machines notify the pub-sub logic by publishing interesting 
happenings eg. V1 started running, V2 succeeded, V3 parallelism changed etc.
2) Any listener can register for notifications on the above interesting 
happenings. So the initializer can register on V1 started running or succeeded. 
When it gets notified that V1 started running then it can get the number of 
tasks in V1 and wait for those many tasks. When it gets V1 succeeded 
notification then it knows its done (either because of 0 tasks or all tasks 
completed).

> Handle parallelism updates and versioning w/ custom InputInitializerEvents
> --------------------------------------------------------------------------
>
>                 Key: TEZ-1447
>                 URL: https://issues.apache.org/jira/browse/TEZ-1447
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Gunther Hagleitner
>            Assignee: Bikas Saha
>            Priority: Blocker
>             Fix For: 0.5.0
>
>
> I'm trying to do dynamic partition pruning through input initializer events 
> in Hive. That means that the initializer of a table scan vertex has to 
> receive events from all tasks in another vertex (which contain the pruning 
> info) before generating tasks to run.
> The problem with the current API I ran into:
> getNumTasks: I'm currently using a busy loop to wait for the num tasks for a 
> vertex to be decided (-1 -> x). There's no way around it, because it's the 
> only way to find out what number of events to expect (0 is a valid number of 
> tasks - so I can't wait for the first to complete).
> With auto-reducer parallelism I have to employ another busy loop. Because I 
> might be initially expecting 10 events, which later get's knocked down to 5. 
> Since there's no event associated with this, I have to periodically check 
> whether I have enough events.
> Versioning: Events have a version number, but I don't know which task they 
> are coming from. Thus I can't de-dup events.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to