Ya, it sounds like that. Could you link those items to the following JIRA? https://issues.apache.org/jira/browse/SPARK-44111 Prepare Apache Spark 4.0.0
Dongjoon. On Tue, Jun 20, 2023 at 12:45 PM Holden Karau <hol...@pigscanfly.ca> wrote: > That seems like a really good reason for a major version change given the > % of PySpark users and the fact we are (effectively) tied to pandas APIs. > > On Tue, Jun 20, 2023 at 12:24 PM Bjørn Jørgensen <bjornjorgen...@gmail.com> > wrote: > >> One big thing for 4.0 will be that pandas API on spark will support >> pandas version 2.0 >> >> With the major release of pandas 2.0.0 on April 3, 2023, numerous >> breaking changes have been introduced. So, we have made the decision to >> postpone addressing these breaking changes until the next major release of >> Spark, version 4.0.0 to minimize disruptions for our users and provide a >> more seamless upgrade experience. >> >> The pandas 2.0.0 release includes a significant number of updates, such >> as API removals, changes in API behavior, parameter removals, parameter >> behavior changes, and bug fixes. We have planned the following approach for >> each item: >> >> - *API Removals*: Removed APIs will remain deprecated in Spark 3.5.0, >> provide appropriate warnings, and will be removed in Spark 4.0.0. >> >> - *API Behavior Changes*: APIs with changed behavior will retain the >> behavior in Spark 3.5.0, provide appropriate warnings, and will align the >> behavior with pandas in Spark 4.0.0. >> >> - *Parameter Removals*: Removed parameters will remain deprecated in >> Spark 3.5.0, provide appropriate warnings, and will be removed in Spark >> 4.0.0. >> >> - *Parameter Behavior Changes*: Parameters with changed behavior will >> retain the behavior in Spark 3.5.0, provide appropriate warnings, and will >> align the behavior with pandas in Spark 4.0.0. >> >> - *Bug Fixes*: Bug fixes mainly related to correctness issues will be >> fixed in pandas 3.5.0. >> >> *To recap, all breaking changes related to pandas 2.0.0 will be supported >> in Spark 4.0.0,* *and will remain deprecated with appropriate errors in >> Spark 3.5.0.* >> >> >> >> https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel >> >> tir. 20. juni 2023 kl. 06:18 skrev Dongjoon Hyun <dongj...@apache.org>: >> >>> Hi, Herman. >>> >>> This is a series of discussions as I re-summarized here. >>> >>> You can find some context in the previous timeline thread. >>> >>> 2023-05-30 Apache Spark 4.0 Timeframe? >>> https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6 >>> >>> Could you reply there to collect your timeline suggestions? We can >>> discuss more there. >>> >>> Dongjoon. >>> >>> >>> >>> On Mon, Jun 19, 2023 at 1:58 PM Herman van Hovell <her...@databricks.com> >>> wrote: >>> >>>> Dongjoon, I am not sure if I am not sure if I follow the line of >>>> thought here. >>>> >>>> Multiple people have asked for clarification on what Spark 4.0 would >>>> mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me >>>> to this list. However you choose to single out Xiao because asks this >>>> question and wants to do a preview release as well? So again, what does >>>> Spark 4 mean, and why does it need to take almost a year? Historically >>>> major Spark releases tend to break APIs, but if it only entails changing to >>>> Scala 2.13 and dropping support for JDK 8, then we could also just release >>>> a month after 3.5. >>>> >>>> How about we do this? We get 3.5 released, and afterwards we do a >>>> couple of meetings where we build this roadmap. Using that, we can - >>>> hopefully - have a grounded discussion. >>>> >>>> Cheers, >>>> Herman >>>> >>>> On Mon, Jun 19, 2023 at 4:01 PM Dongjoon Hyun <dongj...@apache.org> >>>> wrote: >>>> >>>>> Thank you. I reviewed the threads, vote and result once more. >>>>> >>>>> I found that I missed the binding vote mark on Holden in the vote >>>>> result email. The following should be "-0: Holden Karau *". Sorry for this >>>>> mistake, Holden and all. >>>>> >>>>> > -0: Holden Karau >>>>> >>>>> To Hyukjin, I disagree with you at the following point because the >>>>> thread started clearly with your and Sean's Apache Spark 4.0 requirement >>>>> in >>>>> order to move away from Scala 2.12. In addition, we also discussed another >>>>> item (dropping Java 8) from other current dev thread. The vote scope and >>>>> goal is clear and specific. >>>>> >>>>> > we're unclear on the picture of Spark 4.0.0. >>>>> >>>>> Instead of vote scope and result, what is really unclear is that what >>>>> you propose here. If Xiao wants a preview, Xiao can propose the preview >>>>> plan more. It's welcome. If you want to has many 4.0 dev ideas which are >>>>> not exposed to the community yet. Please share them with the community. >>>>> It's welcome, too. Apache Spark is open source community. If you don't >>>>> share it, there is no way for us to know what you want. >>>>> >>>>> Dongjoon >>>>> >>>>> On 2023/06/19 04:31:46 Hyukjin Kwon wrote: >>>>> > The major concerns raised in the thread were that we should initiate >>>>> the >>>>> > discussion for the below first: >>>>> > - Apache Spark 4.0.0 Preview (and Dates) >>>>> > - Apache Spark 4.0.0 Items >>>>> > - Apache Spark 4.0.0 Plan Adjustment >>>>> > >>>>> > before setting the timeline for Spark 4.0.0 because we're unclear on >>>>> the >>>>> > picture of Spark 4.0.0. So discussing the timeline 4.0.0 first is the >>>>> > opposite order procedurally. >>>>> > The vote passed as a procedural issue, but I would prefer to >>>>> consider this >>>>> > as a tentative date, and should probably need another vote to adjust >>>>> the >>>>> > date considering the plans, preview dates, and items we aim for >>>>> 4.0.0. >>>>> > >>>>> > >>>>> > On Sat, 17 Jun 2023 at 04:33, Dongjoon Hyun <dongj...@apache.org> >>>>> wrote: >>>>> > >>>>> > > This was a part of the following on-going discussions. >>>>> > > >>>>> > > 2023-05-28 Apache Spark 3.5.0 Expectations (?) >>>>> > > https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0 >>>>> > > >>>>> > > 2023-05-30 Apache Spark 4.0 Timeframe? >>>>> > > https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6 >>>>> > > >>>>> > > 2023-06-05 ASF policy violation and Scala version issues >>>>> > > https://lists.apache.org/thread/k7gr65wt0fwtldc7hp7bd0vkg1k93rrb >>>>> > > >>>>> > > 2023-06-12 [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024) >>>>> > > https://lists.apache.org/thread/r0zn6rd8y25yn2dg59ktw3ttrwxzqrfb >>>>> > > >>>>> > > I'm looking forward to seeing the upcoming detailed discussions >>>>> including >>>>> > > the following >>>>> > > - Apache Spark 4.0.0 Preview (and Dates) >>>>> > > - Apache Spark 4.0.0 Items >>>>> > > - Apache Spark 4.0.0 Plan Adjustment >>>>> > > >>>>> > > Please initiate the discussion. >>>>> > > >>>>> > > Thanks, >>>>> > > Dongjoon. >>>>> > > >>>>> > > >>>>> > > On 2023/06/16 19:30:42 Dongjoon Hyun wrote: >>>>> > > > The vote passes with 6 +1s (4 binding +1s), one -0, and one -1. >>>>> > > > Thank you all for your participation and >>>>> > > > especially your additional comments during this voting, >>>>> > > > Mridul, Hyukjin, and Jungtaek. >>>>> > > > >>>>> > > > (* = binding) >>>>> > > > +1: >>>>> > > > - Dongjoon Hyun * >>>>> > > > - Huaxin Gao * >>>>> > > > - Liang-Chi Hsieh * >>>>> > > > - Kazuyuki Tanimura >>>>> > > > - Chao Sun * >>>>> > > > - Jia Fan >>>>> > > > >>>>> > > > -0: Holden Karau >>>>> > > > >>>>> > > > -1: Xiao Li * >>>>> > > > >>>>> > > >>>>> > > >>>>> --------------------------------------------------------------------- >>>>> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> > > >>>>> > > >>>>> > >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> >> >> -- >> Bjørn Jørgensen >> Vestre Aspehaug 4, 6010 Ålesund >> Norge >> >> +47 480 94 297 >> > > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >