[ https://issues.apache.org/jira/browse/SPARK-30442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047024#comment-17047024 ]
Abhishek Madav commented on SPARK-30442: ---------------------------------------- In case of task failures, say the task fails to write to local-disk or is interrupted, the file is empty but materialized on the file-system. The next task which retries to write to this location would see the file and return a FileAlreadyExistException. Thus making it not resilient to task-failures. > Write mode ignored when using CodecStreams > ------------------------------------------ > > Key: SPARK-30442 > URL: https://issues.apache.org/jira/browse/SPARK-30442 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 2.4.4 > Reporter: Jesse Collins > Priority: Major > > Overwrite is hardcoded to false in the codec stream. This can cause issues, > particularly with aws tools, that make it impossible to retry. > Ideally, this should be read from the write mode set for the DataWriter that > is writing through this codec class. > [https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala#L81] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org