[
https://issues.apache.org/jira/browse/SPARK-29259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-29259:
----------------------------------
Affects Version/s: (was: 2.4.4)
3.0.0
> Filesystem.exists is called even when not necessary for append save mode
> ------------------------------------------------------------------------
>
> Key: SPARK-29259
> URL: https://issues.apache.org/jira/browse/SPARK-29259
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Rahij Ramsharan
> Priority: Minor
> Fix For: 3.0.0
>
>
> When saving a dataframe into Hadoop
> ([https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L93]),
> spark first checks if the file exists before inspecting the SaveMode to
> determine if it should actually insert data. However, the pathExists variable
> is actually not used in the case of SaveMode.Append. In some file systems,
> the exists call can be expensive and hence this PR makes that call only when
> necessary.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]