[ 
https://issues.apache.org/jira/browse/SPARK-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374266#comment-16374266
 ] 

Xiaoju Wu commented on SPARK-23493:
-----------------------------------

If that's the case, it should throw an exception to tell the users of the API 
"please set the partition column at last" instead of swallow the issue and just 
return the data inserted incorrectly, right?

> insert-into depends on columns order, otherwise incorrect data inserted
> -----------------------------------------------------------------------
>
>                 Key: SPARK-23493
>                 URL: https://issues.apache.org/jira/browse/SPARK-23493
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.1
>            Reporter: Xiaoju Wu
>            Priority: Minor
>
> insert-into only works when the partitionby key columns are set at last:
> val data = Seq(
>  (7, "test1", 1.0),
>  (8, "test#test", 0.0),
>  (9, "test3", 0.0)
>  )
>  import spark.implicits._
> val table = "default.tbl"
>  spark
>  .createDataset(data)
>  .toDF("col1", "col2", "col3")
>  .write
>  .partitionBy("col1")
>  .saveAsTable(table)
> val data2 = Seq(
>  (7, "test2", 1.0),
>  (8, "test#test", 0.0),
>  (9, "test3", 0.0)
>  )
>  spark
>  .createDataset(data2)
>  .toDF("col1", "col2", "col3")
>   .write
>  .insertInto(table)
> sql("select * from " + table).show()
> +------------+-++----
> |col2|col3|col1|
> +------------+-++----
> |test#test|0.0|8|
> |test1|1.0|7|
> |test3|0.0|9|
> |8|null|0|
> |9|null|0|
> |7|null|1|
> +------------+-++----
>  
> If you try inserting with sql, the issue is the same.
> val data = Seq(
>  (7, "test1", 1.0),
>  (8, "test#test", 0.0),
>  (9, "test3", 0.0)
>  )
>  import spark.implicits._
> val table = "default.tbl"
>  spark
>  .createDataset(data)
>  .toDF("col1", "col2", "col3")
>  .write
>  .partitionBy("col1")
>  .saveAsTable(table)
> val data2 = Seq(
>  (7, "test2", 1.0),
>  (8, "test#test", 0.0),
>  (9, "test3", 0.0)
>  )
> sql("insert into " + table + " values(7,'test2',1.0)")
>  sql("select * from " + table).show()
> +----------+---++----
> |col2|col3|col1|
> +----------+---++----
> |test#test|0.0|8|
> |test1|1.0|7|
> |test3|0.0|9|
> |7|null|1|
> +----------+---++----
> No exception was thrown since I only run insertInto, not together with 
> partitionBy. The data are inserted incorrectly. The issue is related to 
> column order. If I change to partitionBy col3, which is the last column, it 
> works.
> val data = Seq(
>  (7, "test1", 1.0),
>  (8, "test#test", 0.0),
>  (9, "test3", 0.0)
>  )
>  import spark.implicits._
> val table = "default.tbl"
>  spark
>  .createDataset(data)
>  .toDF("col1", "col2", "col3")
>  .write
>  .partitionBy("col3")
>  .saveAsTable(table)
> val data2 = Seq(
>  (7, "test2", 1.0),
>  (8, "test#test", 0.0),
>  (9, "test3", 0.0)
>  )
> spark
>  .createDataset(data2)
>  .toDF("col1", "col2", "col3")
>  .write
>  .insertInto(table)
> sql("select * from " + table).show()
> +-------+------++----
> |col1|col2|col3|
> +-------+------++----
> |8|test#test|0.0|
> |9|test3|0.0|
> |8|test#test|0.0|
> |9|test3|0.0|
> |7|test1|1.0|
> |7|test2|1.0|
> +-------+------++----



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to