[jira] [Comment Edited] (ARROW-10793) [Rust] [DataFusion] Decide on CAST behaviour for invalid inputs

Mike Seddon (Jira) Wed, 02 Dec 2020 14:47:06 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242793#comment-17242793
 ]


Mike Seddon edited comment on ARROW-10793 at 12/2/20, 10:46 PM:
----------------------------------------------------------------

I strongly feel that DataFusion should adopt the ANSI-style strict typing 
rather than silent error suppression and return of NULL values as intuitively 
(due to years of using DBMS) users expect that if an error was not thrown then 
all operations were completed successfully.

The default Spark behavior was inherited from Hive SQL which I assume was 
originally built to support a business where absolute precision was not 
necessarily important. As part of the Spark 3.0 release a huge amount of effort 
was put in to comply with ANSI standard SQL 
([https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html)])
 which is obviously a lot harder to retrofit than start with.

This also goes wider than type conversions as adopting ANSI SQL standards 
(including functionality like [https://github.com/apache/arrow/pull/8688] which 
I think requires a CASE statement in ANSI SQL) should maybe be agreed by the 
PMC to give a framework for assessing PRs against. Perhaps this ticket should 
be changed to a discussion of which dialect of SQL DataFusion aims to support.


was (Author: mikeseddonau):
I strongly feel that DataFusion should adopt the ANSI-style strict typing 
rather than silent error suppression and return of NULL values as intuitively 
(due to years of using DBMS) users expect that if an error was not thrown then 
all operations were completed successfully.

The default Spark behavior was inherited from Hive SQL which I assume was 
originally built at Facebook where absolute precision was not necessarily 
important. As part of the Spark 3.0 release a huge amount of effort was put in 
to comply with ANSI standard SQL 
([https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html)])
 which is obviously a lot harder to retrofit than start with.

This also goes wider than type conversions as adopting ANSI SQL standards 
(including functionality like [https://github.com/apache/arrow/pull/8688] which 
I think requires a CASE statement in ANSI SQL) should maybe be agreed by the 
PMC to give a framework for assessing PRs against. Perhaps this ticket should 
be changed to a discussion of which dialect of SQL DataFusion aims to support.

> [Rust] [DataFusion] Decide on CAST behaviour for invalid inputs
> ---------------------------------------------------------------
>
>                 Key: ARROW-10793
>                 URL: https://issues.apache.org/jira/browse/ARROW-10793
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust - DataFusion
>            Reporter: Andy Grove
>            Priority: Major
>
> This is a placeholder for now. See discussion on 
> [https://github.com/apache/arrow/pull/8794]
> Briefly, the issue is do we want CAST to return null for invalid inputs or 
> throw an error. Spark has different behavior depending on whether ANSI mode 
> is enabled or not.
> I'm not sure if this is a DataFusion specific or a more general Arrow issue 
> yet. It needs a discussion.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10793) [Rust] [DataFusion] Decide on CAST behaviour for invalid inputs

Reply via email to