Ian Hellstrom created SPARK-16410:
-------------------------------------
Summary: DataFrameWriter's jdbc method drops table in overwrite
mode
Key: SPARK-16410
URL: https://issues.apache.org/jira/browse/SPARK-16410
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.6.2, 1.4.1
Reporter: Ian Hellstrom
According to the [API
documentation|http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter],
the write mode {{append}} should _overwrite the existing data_, which suggests
that the data is removed, i.e. the table is truncated.
However, that is now what happens in the [source
code|https://github.com/apache/spark/blob/0ad6ce7e54b1d8f5946dde652fa5341d15059158/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L421]:
{code}
if (mode == SaveMode.Overwrite && tableExists) {
JdbcUtils.dropTable(conn, table)
tableExists = false
}
{code}
This clearly shows that the table is first dropped and then recreated. This
causes two major issues:
* Existing indexes, partitions, etc. are completely lost.
* The case of identifiers may be changed without the user understanding why.
In my opinion, the table should be truncated, not dropped.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]