Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Dongjoon Hyun
Thank you for your opinions, Gangling, Liang-Chi, Wenchen, Huaxin, Serge, Nicholas. To Nicholas, Apache Spark community already decided not to pursuit PostgreSQL dialect. > I’m flagging this since Spark’s behavior differs in these cases from > Postgres, > as described in the ticket. Please

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread Nicholas Chammas
This is a side issue, but I’d like to bring people’s attention to SPARK-28024. Cases 2, 3, and 4 described in that ticket are still problems today on master (I just rechecked) even with ANSI mode enabled. Well, maybe not problems, but I’m flagging this since Spark’s behavior differs in these

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread serge rielau . com
+1 it‘s the wrapping on math overflows that does it for me. Sent from my iPhone On Apr 12, 2024, at 9:36 AM, huaxin gao wrote:  +1 On Thu, Apr 11, 2024 at 11:18 PM L. C. Hsieh mailto:vii...@gmail.com>> wrote: +1 I believe ANSI mode is well developed after many releases. No doubt it could

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread huaxin gao
+1 On Thu, Apr 11, 2024 at 11:18 PM L. C. Hsieh wrote: > +1 > > I believe ANSI mode is well developed after many releases. No doubt it > could be used. > Since it is very easy to disable it to restore to current behavior, I > guess the impact could be limited. > Do we have known the possible

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread Wenchen Fan
+1, the existing "NULL on error" behavior is terrible for data quality. I have one concern about error reporting with DataFrame APIs. Query execution is lazy and where the error happens can be far away from where the dataframe/column was created. We are improving it (PR

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread L. C. Hsieh
+1 I believe ANSI mode is well developed after many releases. No doubt it could be used. Since it is very easy to disable it to restore to current behavior, I guess the impact could be limited. Do we have known the possible impacts such as what are the major changes (e.g., what kind of

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-11 Thread Gengliang Wang
+1, enabling Spark's ANSI SQL mode in version 4.0 will significantly enhance data quality and integrity. I fully support this initiative. > In other words, the current Spark ANSI SQL implementation becomes the first implementation for Spark SQL users to face at first while providing

[DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-11 Thread Dongjoon Hyun
Hi, All. Thanks to you, we've been achieving many things and have on-going SPIPs. I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly by asking your opinions about Apache Spark's ANSI SQL mode. https://issues.apache.org/jira/browse/SPARK-44111 Prepare Apache Spark