[ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-29037:
------------------------------------

    Assignee: Apache Spark

> [Core] Spark gives duplicate result when an application was killed and rerun
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-29037
>                 URL: https://issues.apache.org/jira/browse/SPARK-29037
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.3.3
>            Reporter: feiwang
>            Assignee: Apache Spark
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> For InsertIntoHadoopFsRelation operations.
> Case A:
> Application appA insert overwrite table table_a with static partition 
> overwrite.
> But it was killed when committing tasks, because one task is hang.
> And parts of its committed tasks output is kept under 
> /path/table_a/_temporary/0/.
> Then we rerun appA. It will reuse the staging dir /path/table_a/_temporary/0/.
> It executes successfully.
> But it also commit the data reminded by killed application to destination dir.
> Case B:
> Application appA insert overwrite table table_a.
> Application appB insert overwrite table table_a, too.
> They execute concurrently, and they may all use /path/table_a/_temporary/0/ 
> as workPath.
> And their result may be corruptted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to