Are there other Python dependencies we should consider upgrading at the same time?
On Fri, Jun 14, 2019 at 7:45 PM Felix Cheung <felixcheun...@hotmail.com> wrote: > So to be clear, min version check is 0.23 > Jenkins test is 0.24 > > I’m ok with this. I hope someone will test 0.23 on releases though before > we sign off? > We should maybe add this to the release instruction notes? > > ------------------------------ > *From:* shane knapp <skn...@berkeley.edu> > *Sent:* Friday, June 14, 2019 10:23:56 AM > *To:* Bryan Cutler > *Cc:* Dongjoon Hyun; Holden Karau; Hyukjin Kwon; dev > *Subject:* Re: [DISCUSS] Increasing minimum supported version of Pandas > > excellent. i shall not touch anything. :) > > On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler <cutl...@gmail.com> wrote: > >> Shane, I think 0.24.2 is probably more common right now, so if we were to >> pick one to test against, I still think it should be that one. Our Pandas >> usage in PySpark is pretty conservative, so it's pretty unlikely that we >> will add something that would break 0.23.X. >> >> On Fri, Jun 14, 2019 at 10:10 AM shane knapp <skn...@berkeley.edu> wrote: >> >>> ah, ok... should we downgrade the testing env on jenkins then? any >>> specific version? >>> >>> shane, who is loathe (and i mean LOATHE) to touch python envs ;) >>> >>> On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler <cutl...@gmail.com> wrote: >>> >>>> I should have stated this earlier, but when the user does something >>>> that requires Pandas, the minimum version is checked against what was >>>> imported and will raise an exception if it is a lower version. So I'm >>>> concerned that using 0.24.2 might be a little too new for users running >>>> older clusters. To give some release dates, 0.23.2 was released about a >>>> year ago, 0.24.0 in January and 0.24.2 in March. >>>> >>> I think given that we’re switching to requiring Python 3 and also a bit of a way from cutting a release 0.24 could be Ok as a min version requirement > >>>> >>>> On Fri, Jun 14, 2019 at 9:27 AM shane knapp <skn...@berkeley.edu> >>>> wrote: >>>> >>>>> just to everyone knows, our python 3.6 testing infra is currently on >>>>> 0.24.2... >>>>> >>>>> On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> Thank you for this effort, Bryan! >>>>>> >>>>>> Bests, >>>>>> Dongjoon. >>>>>> >>>>>> On Fri, Jun 14, 2019 at 4:24 AM Holden Karau <hol...@pigscanfly.ca> >>>>>> wrote: >>>>>> >>>>>>> I’m +1 for upgrading, although since this is probably the last easy >>>>>>> chance we’ll have to bump version numbers easily I’d suggest 0.24.2 >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon <gurwls...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow >>>>>>>> and pandas combinations. Spark 3 should be good time to increase. >>>>>>>> >>>>>>>> 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler <cutl...@gmail.com>님이 작성: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> We would like to discuss increasing the minimum supported version >>>>>>>>> of Pandas in Spark, which is currently 0.19.2. >>>>>>>>> >>>>>>>>> Pandas 0.19.2 was released nearly 3 years ago and there are some >>>>>>>>> workarounds in PySpark that could be removed if such an old version >>>>>>>>> is not >>>>>>>>> required. This will help to keep code clean and reduce maintenance >>>>>>>>> effort. >>>>>>>>> >>>>>>>>> The change is targeted for Spark 3.0.0 release, see >>>>>>>>> https://issues.apache.org/jira/browse/SPARK-28041. The current >>>>>>>>> thought is to bump the version to 0.23.2, but we would like to discuss >>>>>>>>> before making a change. Does anyone else have thoughts on this? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Bryan >>>>>>>>> >>>>>>>> -- >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Shane Knapp >>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>> https://rise.cs.berkeley.edu >>>>> >>>> >>> >>> -- >>> Shane Knapp >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau