[
https://issues.apache.org/jira/browse/TEZ-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383337#comment-15383337
]
Peter Slawski commented on TEZ-3356:
------------------------------------
I've attached patch which fixes this issue and adds a test case which
illustrates how this bug could be hit with a custom ShuffleVertexManager that
is roughly a much simpler version of PigGraceShuffleVertexManager.
> Fix initializing of stats when custom ShuffleVertexManager is used
> ------------------------------------------------------------------
>
> Key: TEZ-3356
> URL: https://issues.apache.org/jira/browse/TEZ-3356
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.8.4
> Reporter: Peter Slawski
> Attachments: TEZ-3356.1.patch
>
>
> When using a custom ShuffleVertexManager to set a vertex’s parallelism, the
> partition stats field will be left uninitialized even after the manager
> itself gets initialized. This results in a IllegalStateException to be thrown
> as the stats field will not yet be initialized when VertexManagerEvents are
> processed upon the start of the vertex. Note that these events contain
> partition sizes which are aggregated and stored in this stats field.
>
> Apache Pig’s grace auto-parallelism feature uses a custom
> ShuffleVertexManager which sets a vertex’s parallelism upon the completion of
> one of its parent’s parents. Thus, this corner case is hit and pig scripts
> with grace parallelism enabled would fail if the DAG consists of at least one
> vertex having grandparents.
>
> The fix should be straight forward. Before rather than after
> VertexManagerEvents are processed, simply update pending tasks to ensure the
> partition stats field will be initialized.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)