Rahij Ramsharan created SPARK-29259:
---------------------------------------
Summary: Filesystem.exists is called even when not necessary for
append save mode
Key: SPARK-29259
URL: https://issues.apache.org/jira/browse/SPARK-29259
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.4.4
Reporter: Rahij Ramsharan
When saving a dataframe into Hadoop
([https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L93]),
spark first checks if the file exists before inspecting the SaveMode to
determine if it should actually insert data. However, the pathExists variable
is actually not used in the case of SaveMode.Append. In some file systems, the
exists call can be expensive and hence this PR makes that call only when
necessary.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]