[ 
https://issues.apache.org/jira/browse/SPARK-28495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-28495:
-----------------------------------
    Description: 
In Spark version 2.4 and earlier, when inserting into a table, Spark will cast 
the data type of input query to the data type of target table by coercion. This 
can be super confusing, e.g. users make a mistake and write string values to an 
int column.

In data source V2,  by default, only upcasting is allowed when inserting data 
into a table. E.g. int -> long and int -> string are allowed, while decimal -> 
double or long -> int are not allowed. The rules of UpCast was originally 
created for Dataset type coercion. They are quite strict and different from the 
behavior of all existing popular DBMS. This is breaking change. It is possible 
that existing queries are broken after 3.0 releases.

Following ANSI SQL standard is the most proper solution as the community voted 
in dev list. 
For more details, see the discussion on 
http://apache-spark-developers-list.1001551.n3.nabble.com/Discuss-Follow-ANSI-SQL-on-table-insertion-td27531.html#a27562
 and https://github.com/apache/spark/pull/25453 .

This task is to add ANSI store assignment policy as the default option of 
configuration "spark.sql.storeAssignmentPolicy“

  was:
In Spark version 2.4 and earlier, when inserting into a table, Spark will cast 
the data type of input query to the data type of target table by coercion. This 
can be super confusing, e.g. users make a mistake and write string values to an 
int column.

In data source V2,  by default, only upcasting is allowed when inserting data 
into a table. E.g. int -> long and int -> string are allowed, while decimal -> 
double or long -> int are not allowed. The rules of UpCast was originally 
created for Dataset type coercion. They are quite strict and different from the 
behavior of all existing popular DBMS. This is breaking change. It is possible 
that it would hurt some Spark users after 3.0 releases.

This PR proposes that we can follow the rules of store assignment(section 9.2) 
in ANSI SQL. Two significant differences from Up-Cast:
1. Any numeric type can be assigned to another numeric type.
2. TimestampType can be assigned DateType

The new behavior is consistent with PostgreSQL. It is more explainable and 
acceptable than using UpCast .





> Introduce ANSI store assignment policy for table insertion
> ----------------------------------------------------------
>
>                 Key: SPARK-28495
>                 URL: https://issues.apache.org/jira/browse/SPARK-28495
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> In Spark version 2.4 and earlier, when inserting into a table, Spark will 
> cast the data type of input query to the data type of target table by 
> coercion. This can be super confusing, e.g. users make a mistake and write 
> string values to an int column.
> In data source V2,  by default, only upcasting is allowed when inserting data 
> into a table. E.g. int -> long and int -> string are allowed, while decimal 
> -> double or long -> int are not allowed. The rules of UpCast was originally 
> created for Dataset type coercion. They are quite strict and different from 
> the behavior of all existing popular DBMS. This is breaking change. It is 
> possible that existing queries are broken after 3.0 releases.
> Following ANSI SQL standard is the most proper solution as the community 
> voted in dev list. 
> For more details, see the discussion on 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Discuss-Follow-ANSI-SQL-on-table-insertion-td27531.html#a27562
>  and https://github.com/apache/spark/pull/25453 .
> This task is to add ANSI store assignment policy as the default option of 
> configuration "spark.sql.storeAssignmentPolicy“



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to