[
https://issues.apache.org/jira/browse/SPARK-29649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-29649:
----------------------------------
Component/s: SQL
> Stop task set if FileAlreadyExistsException was thrown when writing to output
> file
> ----------------------------------------------------------------------------------
>
> Key: SPARK-29649
> URL: https://issues.apache.org/jira/browse/SPARK-29649
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core, SQL
> Affects Versions: 3.0.0
> Reporter: L. C. Hsieh
> Assignee: L. C. Hsieh
> Priority: Major
>
> We already know task attempts that do not clean up output files in staging
> directory can cause job failure (SPARK-27194). There was proposals trying to
> fix it by changing output filename, or deleting existing output files. These
> proposals are not reliable completely.
> The difficulty is, as previous failed task attempt wrote the output file, at
> next task attempt the output file is still under same staging directory, even
> the output file name is different.
> If the job will go to fail eventually, there is no point to re-run the task
> until max attempts are reached. For the jobs running a lot of time,
> re-running the task can waste a lot of time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]