[ 
https://issues.apache.org/jira/browse/SPARK-42582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693579#comment-17693579
 ] 

Tengfei Huang edited comment on SPARK-42582 at 2/26/23 3:27 AM:
----------------------------------------------------------------

This is also discussed in PR: https://github.com/apache/spark/pull/39459

cc [~mridulm80] [~Ngone51]

Created this ticket to track the issue about inconsistent persisted rdd blocks 
issue.


was (Author: ivoson):
This is also discussed in PR: https://github.com/apache/spark/pull/39459

cc [~mridulm80] cc [~Ngone51]

Created this ticket to track the issue about inconsistent persisted rdd blocks 
issue.

> Persisted RDD blocks can be inconsistent if the RDD computation is 
> indeterminate
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-42582
>                 URL: https://issues.apache.org/jira/browse/SPARK-42582
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.3.2
>            Reporter: Tengfei Huang
>            Priority: Major
>
> When a rdd includes indeterminate operations, the rdd results can be 
> different each time we recompute it.
> And when we cache such a rdd, we may have multiple rdd block replicas having 
> different data. Here is an example:
> 1. Task A generated the rdd block rdd_1_1 on executor E1;
> 2. Task B on executor E2 tried to fetch remote rdd_1_1 from E1 but failed, 
> then it will compute and cache another block on E2; 
> If the results on E1 and E2 are differnet, we'll have 2 blocks for the same 
> rdd partition with different data.
> The behavior will be unexpcted for such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to