[
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-29037:
------------------------------------
Assignee: Apache Spark
> [Core] Spark gives duplicate result when an application was killed and rerun
> ----------------------------------------------------------------------------
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0, 2.3.3
> Reporter: feiwang
> Assignee: Apache Spark
> Priority: Major
> Attachments: screenshot-1.png
>
>
> For InsertIntoHadoopFsRelation operations.
> Case A:
> Application appA insert overwrite table table_a with static partition
> overwrite.
> But it was killed when committing tasks, because one task is hang.
> And parts of its committed tasks output is kept under
> /path/table_a/_temporary/0/.
> Then we rerun appA. It will reuse the staging dir /path/table_a/_temporary/0/.
> It executes successfully.
> But it also commit the data reminded by killed application to destination dir.
> Case B:
> Application appA insert overwrite table table_a.
> Application appB insert overwrite table table_a, too.
> They execute concurrently, and they may all use /path/table_a/_temporary/0/
> as workPath.
> And their result may be corruptted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]