Bill Chambers created SPARK-16234:
-------------------------------------
Summary: Speculative Task may not be able to overwrite file
Key: SPARK-16234
URL: https://issues.apache.org/jira/browse/SPARK-16234
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.0.0
Reporter: Bill Chambers
given spark.speculative set to true, I'm running a large spark job with parquet
and savemode overwrite.
Spark will speculatively try to create a task to deal with this straggler.
However, doing this comes with risk because EVEN THOUGH savemode overwrite is
selected, if the straggler completes before the original task or the original
task completes before the straggler then the job will fail due to the file
already existing.
java.io.IOException:
/...some-file.../part-r-00049-401da178-3343-43a4-9c8d-277cc0173bf9.gz.parquet
already exists
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]