Peter Slawski created TEZ-3356:
----------------------------------
Summary: Fix initializing of stats when custom
ShuffleVertexManager is used
Key: TEZ-3356
URL: https://issues.apache.org/jira/browse/TEZ-3356
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.8.4
Reporter: Peter Slawski
When using a custom ShuffleVertexManager to set a vertex’s parallelism, the
partition stats field will be left uninitialized even after the manager itself
gets initialized. This results in a IllegalStateException to be thrown as the
stats field will not yet be initialized when VertexManagerEvents are processed
upon the start of the vertex. Note that these events contain partition sizes
which are aggregated and stored in this stats field.
Apache Pig’s grace auto-parallelism feature uses a custom ShuffleVertexManager
which sets a vertex’s parallelism upon the completion of one of its parent’s
parents. Thus, this corner case is hit and pig scripts with grace parallelism
enabled would fail if the DAG consists of at least one vertex having
grandparents.
The fix should be straight forward. Before rather than after
VertexManagerEvents are processed, simply update pending tasks to ensure the
partition stats field will be initialized.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)