Peter Slawski created TEZ-3356:
----------------------------------

             Summary: Fix initializing of stats when custom 
ShuffleVertexManager is used
                 Key: TEZ-3356
                 URL: https://issues.apache.org/jira/browse/TEZ-3356
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.8.4
            Reporter: Peter Slawski


When using a custom ShuffleVertexManager to set a vertex’s parallelism, the 
partition stats field will be left uninitialized even after the manager itself 
gets initialized. This results in a IllegalStateException to be thrown as the 
stats field will not yet be initialized when VertexManagerEvents are processed 
upon the start of the vertex. Note that these events contain partition sizes 
which are aggregated and stored in this stats field.
 
Apache Pig’s grace auto-parallelism feature uses a custom ShuffleVertexManager 
which sets a vertex’s parallelism upon the completion of one of its parent’s 
parents. Thus, this corner case is hit and pig scripts with grace parallelism 
enabled would fail if the DAG consists of at least one vertex having 
grandparents.
 
The fix should be straight forward. Before rather than after 
VertexManagerEvents are processed, simply update pending tasks to ensure the 
partition stats field will be initialized.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to