Ian Hellstrom created SPARK-16410:
-------------------------------------

             Summary: DataFrameWriter's jdbc method drops table in overwrite 
mode
                 Key: SPARK-16410
                 URL: https://issues.apache.org/jira/browse/SPARK-16410
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.2, 1.4.1
            Reporter: Ian Hellstrom


According to the [API 
documentation|http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter],
 the write mode {{append}} should _overwrite the existing data_, which suggests 
that the data is removed, i.e. the table is truncated. 

However, that is now what happens in the [source 
code|https://github.com/apache/spark/blob/0ad6ce7e54b1d8f5946dde652fa5341d15059158/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L421]:

{code}
if (mode == SaveMode.Overwrite && tableExists) {
        JdbcUtils.dropTable(conn, table)
        tableExists = false
      }
{code}

This clearly shows that the table is first dropped and then recreated. This 
causes two major issues:
* Existing indexes, partitions, etc. are completely lost.
* The case of identifiers may be changed without the user understanding why.

In my opinion, the table should be truncated, not dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to