[
https://issues.apache.org/jira/browse/SPARK-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcelo Vanzin updated SPARK-22162:
-----------------------------------
Affects Version/s: (was: 2.3.0)
> Executors and the driver use inconsistent Job IDs during the new RDD commit
> protocol
> ------------------------------------------------------------------------------------
>
> Key: SPARK-22162
> URL: https://issues.apache.org/jira/browse/SPARK-22162
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.2.0
> Reporter: Reza Safi
> Assignee: Reza Safi
> Fix For: 2.3.0
>
>
> After SPARK-18191 commit in pull request 15769, using the new commit protocol
> it is possible that driver and executors uses different jobIds during a rdd
> commit.
> In the old code, the variable stageId is part of the closure used to define
> the task as you can see here:
>
> [https://github.com/apache/spark/blob/9c8deef64efee20a0ddc9b612f90e77c80aede60/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L1098]
> As a result, a TaskAttemptId is constructed in executors using the same
> "stageId" as the driver, since it is a value that is serialized in the
> driver. Also the value of stageID is actually the rdd.id which is assigned
> here:
> [https://github.com/apache/spark/blob/9c8deef64efee20a0ddc9b612f90e77c80aede60/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L1084]
> However, after the change in pull request 15769, the value is no longer part
> of the task closure, which gets serialized by the driver. Instead, it is
> pulled from the taskContext as you can see
> here:[https://github.com/apache/spark/pull/15769/files#diff-dff185cb90c666bce445e3212a21d765R103]
> and then that value is used to construct the TaskAttemptId on the executors:
> [https://github.com/apache/spark/pull/15769/files#diff-dff185cb90c666bce445e3212a21d765R134]
> taskContext has a stageID value which will be set in DAGScheduler. So after
> the change unlike the old code which a rdd.id was used, an actual stage.id is
> used which can be different between executors and the driver since it is no
> longer serialized.
> In summary, the old code consistently used rddId, and just incorrectly named
> it "stageId".
> The new code uses a mix of rddId and stageId. There should be a consistent ID
> between executors and the drivers.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]