Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread Chao Sun
+1 On Fri, Apr 12, 2024 at 4:23 PM Xiao Li wrote: > +1 > > > > > On Fri, Apr 12, 2024 at 14:30 bo yang wrote: > >> +1 >> > >> On Fri, Apr 12, 2024 at 12:34 PM huaxin gao >> wrote: >> >>> +1 >>> >>> On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun >>> wrote: >>> +1 Thank you!

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread Xiao Li
+1 On Fri, Apr 12, 2024 at 14:30 bo yang wrote: > +1 > > On Fri, Apr 12, 2024 at 12:34 PM huaxin gao > wrote: > >> +1 >> >> On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun >> wrote: >> >>> +1 >>> >>> Thank you! >>> >>> I hope we can customize `dev/merge_spark_pr.py` script per repository >>>

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread bo yang
+1 On Fri, Apr 12, 2024 at 12:34 PM huaxin gao wrote: > +1 > > On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun wrote: > >> +1 >> >> Thank you! >> >> I hope we can customize `dev/merge_spark_pr.py` script per repository >> after this PR. >> >> Dongjoon. >> >> On 2024/04/12 03:28:36 "L. C. Hsieh" w

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread L. C. Hsieh
+1 Thank you, Dongjoon. Yea, We may need to customize the merge script for a particular repository. On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun wrote: > > +1 > > Thank you! > > I hope we can customize `dev/merge_spark_pr.py` script per repository after > this PR. > > Dongjoon. > > On 2024/04

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread Nicholas Chammas
This is a side issue, but I’d like to bring people’s attention to SPARK-28024. Cases 2, 3, and 4 described in that ticket are still problems today on master (I just rechecked) even with ANSI mode enabled. Well, maybe not problems, but I’m flagging this since Spark’s behavior differs in these c

Re: [DISCUSS] Spark 4.0.0 release

2024-04-12 Thread Dongjoon Hyun
Thank you for volunteering, Wenchen. Dongjoon. On 2024/04/12 15:11:04 Wenchen Fan wrote: > Hi all, > > It's close to the previously proposed 4.0.0 release date (June 2024), and I > think it's time to prepare for it and discuss the ongoing projects: > >- ANSI by default >- Spark Connect

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread serge rielau . com
+1 it‘s the wrapping on math overflows that does it for me. Sent from my iPhone On Apr 12, 2024, at 9:36 AM, huaxin gao wrote:  +1 On Thu, Apr 11, 2024 at 11:18 PM L. C. Hsieh mailto:vii...@gmail.com>> wrote: +1 I believe ANSI mode is well developed after many releases. No doubt it could be

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread huaxin gao
+1 On Thu, Apr 11, 2024 at 11:18 PM L. C. Hsieh wrote: > +1 > > I believe ANSI mode is well developed after many releases. No doubt it > could be used. > Since it is very easy to disable it to restore to current behavior, I > guess the impact could be limited. > Do we have known the possible imp

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread huaxin gao
+1 On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun wrote: > +1 > > Thank you! > > I hope we can customize `dev/merge_spark_pr.py` script per repository > after this PR. > > Dongjoon. > > On 2024/04/12 03:28:36 "L. C. Hsieh" wrote: > > Hi all, > > > > Thanks for all discussions in the thread of "Ve

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread Dongjoon Hyun
+1 Thank you! I hope we can customize `dev/merge_spark_pr.py` script per repository after this PR. Dongjoon. On 2024/04/12 03:28:36 "L. C. Hsieh" wrote: > Hi all, > > Thanks for all discussions in the thread of "Versioning of Spark > Operator": https://lists.apache.org/thread/zhc7nb2sxm8jjxdp

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread Wenchen Fan
+1, the existing "NULL on error" behavior is terrible for data quality. I have one concern about error reporting with DataFrame APIs. Query execution is lazy and where the error happens can be far away from where the dataframe/column was created. We are improving it (PR

[DISCUSS] Spark 4.0.0 release

2024-04-12 Thread Wenchen Fan
Hi all, It's close to the previously proposed 4.0.0 release date (June 2024), and I think it's time to prepare for it and discuss the ongoing projects: - ANSI by default - Spark Connect GA - Structured Logging - Streaming state store data source - new data type VARIANT - STRING