[ 
https://issues.apache.org/jira/browse/TEZ-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006416#comment-14006416
 ] 

Bikas Saha commented on TEZ-1145:
---------------------------------

Pig is trying to determine the number of partitions using sampling. Using some 
edge manager in the beginning and changing it later might work for cases when 
the new number of partitions is lower than earlier (like what auto shuffle 
(hash partition reduce) does). But Pig can in fact determine that the ideal 
partitioning is higher than that estimate. So the current best option is not 
set any edge manager at all since the real number of partitions are going to be 
exactly (accurately determined) at runtime and setting the parallelism and edge 
manager at that time. However, in the meanwhile the tasks that are supposed to 
partition the data get launched before the edge gets finalized (because the 
inputs are available/slow-start/immediate-start etc). TEZ-937 would be a good 
way add new API's to handle similar dependencies in case user code needs to 
handle it. 
For now, in this jira, I am planning to make sure that vertices dont start 
until their edges are finalized. This is essentially correct. I dont want to be 
super-restrictive and say vertices cannot INIT until their edges are finalized, 
since that may unnecessarily slow down things like input initialization etc. 
Also, initialization is when a bunch of these runtime dependent logic gets 
triggered. So its important to initialize as soon as possible. Eventually, when 
a vertex is ready to start (ie after all its ancestors have started), that is 
when the vertex can try to run tasks and things become real. This is when the 
vertex manager is invoked, scheduling starts etc. So before we start we should 
make sure everything above and below has been initialized. Thats what I am 
targeting in this jira. It seems correct and in fact something we should have 
already been checking for custom edges.

> Vertices should not start if they have uninitialized custom edges
> -----------------------------------------------------------------
>
>                 Key: TEZ-1145
>                 URL: https://issues.apache.org/jira/browse/TEZ-1145
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Bikas Saha
>
> If the vertex is connected to a custom edge and the edge manager has not been 
> set yet, then that vertex should not start. If it does then it will end up 
> starting tasks that dont have all their specifications identified.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to