[ https://issues.apache.org/jira/browse/TEZ-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006416#comment-14006416 ]
Bikas Saha commented on TEZ-1145: --------------------------------- Pig is trying to determine the number of partitions using sampling. Using some edge manager in the beginning and changing it later might work for cases when the new number of partitions is lower than earlier (like what auto shuffle (hash partition reduce) does). But Pig can in fact determine that the ideal partitioning is higher than that estimate. So the current best option is not set any edge manager at all since the real number of partitions are going to be exactly (accurately determined) at runtime and setting the parallelism and edge manager at that time. However, in the meanwhile the tasks that are supposed to partition the data get launched before the edge gets finalized (because the inputs are available/slow-start/immediate-start etc). TEZ-937 would be a good way add new API's to handle similar dependencies in case user code needs to handle it. For now, in this jira, I am planning to make sure that vertices dont start until their edges are finalized. This is essentially correct. I dont want to be super-restrictive and say vertices cannot INIT until their edges are finalized, since that may unnecessarily slow down things like input initialization etc. Also, initialization is when a bunch of these runtime dependent logic gets triggered. So its important to initialize as soon as possible. Eventually, when a vertex is ready to start (ie after all its ancestors have started), that is when the vertex can try to run tasks and things become real. This is when the vertex manager is invoked, scheduling starts etc. So before we start we should make sure everything above and below has been initialized. Thats what I am targeting in this jira. It seems correct and in fact something we should have already been checking for custom edges. > Vertices should not start if they have uninitialized custom edges > ----------------------------------------------------------------- > > Key: TEZ-1145 > URL: https://issues.apache.org/jira/browse/TEZ-1145 > Project: Apache Tez > Issue Type: Bug > Reporter: Daniel Dai > Assignee: Bikas Saha > > If the vertex is connected to a custom edge and the edge manager has not been > set yet, then that vertex should not start. If it does then it will end up > starting tasks that dont have all their specifications identified. -- This message was sent by Atlassian JIRA (v6.2#6252)