Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

Hyukjin Kwon Thu, 10 Oct 2019 02:14:21 -0700

+1 (binding)

2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro <[email protected]>님이 작성:


> Thanks for the great work, Gengliang!
>
> +1 for that.
> As I said before, the behaviour is pretty common in DBMSs, so the change
> helps for DMBS users.
>
> Bests,
> Takeshi
>
>
> On Mon, Oct 7, 2019 at 5:24 PM Gengliang Wang <
> [email protected]> wrote:
>
>> Hi everyone,
>>
>> I'd like to call for a new vote on SPARK-28885
>> <https://issues.apache.org/jira/browse/SPARK-28885> "Follow ANSI store
>> assignment rules in table insertion by default" after revising the ANSI
>> store assignment policy(SPARK-29326
>> <https://issues.apache.org/jira/browse/SPARK-29326>).
>> When inserting a value into a column with the different data type, Spark
>> performs type coercion. Currently, we support 3 policies for the store
>> assignment rules: ANSI, legacy and strict, which can be set via the option
>> "spark.sql.storeAssignmentPolicy":
>> 1. ANSI: Spark performs the store assignment as per ANSI SQL. In
>> practice, the behavior is mostly the same as PostgreSQL. It disallows
>> certain unreasonable type conversions such as converting `string` to `int`
>> and `double` to `boolean`. It will throw a runtime exception if the value
>> is out-of-range(overflow).
>> 2. Legacy: Spark allows the store assignment as long as it is a valid
>> `Cast`, which is very loose. E.g., converting either `string` to `int` or
>> `double` to `boolean` is allowed. It is the current behavior in Spark 2.x
>> for compatibility with Hive. When inserting an out-of-range value to an
>> integral field, the low-order bits of the value is inserted(the same as
>> Java/Scala numeric type casting). For example, if 257 is inserted into a
>> field of Byte type, the result is 1.
>> 3. Strict: Spark doesn't allow any possible precision loss or data
>> truncation in store assignment, e.g., converting either `double` to `int`
>> or `decimal` to `double` is allowed. The rules are originally for Dataset
>> encoder. As far as I know, no mainstream DBMS is using this policy by
>> default.
>>
>> Currently, the V1 data source uses "Legacy" policy by default, while V2
>> uses "Strict". This proposal is to use "ANSI" policy by default for both V1
>> and V2 in Spark 3.0.
>>
>> This vote is open until Friday (Oct. 11).
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don't think this is a good idea because ...
>>
>> Thank you!
>>
>> Gengliang
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

Reply via email to