Oh btw, why is it 0.23.2, not 0.23.0 or 0.23.4? On Sat, 15 Jun 2019, 06:56 Bryan Cutler, <cutl...@gmail.com> wrote:
> Yeah, PyArrow is the only other PySpark dependency we check for a minimum > version. We updated that not too long ago to be 0.12.1, which I think we > are still good on for now. > > On Fri, Jun 14, 2019 at 11:36 AM Felix Cheung <felixcheun...@hotmail.com> > wrote: > >> How about pyArrow? >> >> ------------------------------ >> *From:* Holden Karau <hol...@pigscanfly.ca> >> *Sent:* Friday, June 14, 2019 11:06:15 AM >> *To:* Felix Cheung >> *Cc:* Bryan Cutler; Dongjoon Hyun; Hyukjin Kwon; dev; shane knapp >> *Subject:* Re: [DISCUSS] Increasing minimum supported version of Pandas >> >> Are there other Python dependencies we should consider upgrading at the >> same time? >> >> On Fri, Jun 14, 2019 at 7:45 PM Felix Cheung <felixcheun...@hotmail.com> >> wrote: >> >>> So to be clear, min version check is 0.23 >>> Jenkins test is 0.24 >>> >>> I’m ok with this. I hope someone will test 0.23 on releases though >>> before we sign off? >>> >> We should maybe add this to the release instruction notes? >> >>> >>> ------------------------------ >>> *From:* shane knapp <skn...@berkeley.edu> >>> *Sent:* Friday, June 14, 2019 10:23:56 AM >>> *To:* Bryan Cutler >>> *Cc:* Dongjoon Hyun; Holden Karau; Hyukjin Kwon; dev >>> *Subject:* Re: [DISCUSS] Increasing minimum supported version of Pandas >>> >>> excellent. i shall not touch anything. :) >>> >>> On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler <cutl...@gmail.com> wrote: >>> >>>> Shane, I think 0.24.2 is probably more common right now, so if we were >>>> to pick one to test against, I still think it should be that one. Our >>>> Pandas usage in PySpark is pretty conservative, so it's pretty unlikely >>>> that we will add something that would break 0.23.X. >>>> >>>> On Fri, Jun 14, 2019 at 10:10 AM shane knapp <skn...@berkeley.edu> >>>> wrote: >>>> >>>>> ah, ok... should we downgrade the testing env on jenkins then? any >>>>> specific version? >>>>> >>>>> shane, who is loathe (and i mean LOATHE) to touch python envs ;) >>>>> >>>>> On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler <cutl...@gmail.com> >>>>> wrote: >>>>> >>>>>> I should have stated this earlier, but when the user does something >>>>>> that requires Pandas, the minimum version is checked against what was >>>>>> imported and will raise an exception if it is a lower version. So I'm >>>>>> concerned that using 0.24.2 might be a little too new for users running >>>>>> older clusters. To give some release dates, 0.23.2 was released about a >>>>>> year ago, 0.24.0 in January and 0.24.2 in March. >>>>>> >>>>> I think given that we’re switching to requiring Python 3 and also a >> bit of a way from cutting a release 0.24 could be Ok as a min version >> requirement >> >>> >>>>>> >>>>>> On Fri, Jun 14, 2019 at 9:27 AM shane knapp <skn...@berkeley.edu> >>>>>> wrote: >>>>>> >>>>>>> just to everyone knows, our python 3.6 testing infra is currently on >>>>>>> 0.24.2... >>>>>>> >>>>>>> On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun < >>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> Thank you for this effort, Bryan! >>>>>>>> >>>>>>>> Bests, >>>>>>>> Dongjoon. >>>>>>>> >>>>>>>> On Fri, Jun 14, 2019 at 4:24 AM Holden Karau <hol...@pigscanfly.ca> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I’m +1 for upgrading, although since this is probably the last >>>>>>>>> easy chance we’ll have to bump version numbers easily I’d suggest >>>>>>>>> 0.24.2 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon <gurwls...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am +1 to go for 0.23.2 - it brings some overhead to test >>>>>>>>>> PyArrow and pandas combinations. Spark 3 should be good time to >>>>>>>>>> increase. >>>>>>>>>> >>>>>>>>>> 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler <cutl...@gmail.com>님이 작성: >>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> We would like to discuss increasing the minimum supported >>>>>>>>>>> version of Pandas in Spark, which is currently 0.19.2. >>>>>>>>>>> >>>>>>>>>>> Pandas 0.19.2 was released nearly 3 years ago and there are some >>>>>>>>>>> workarounds in PySpark that could be removed if such an old version >>>>>>>>>>> is not >>>>>>>>>>> required. This will help to keep code clean and reduce maintenance >>>>>>>>>>> effort. >>>>>>>>>>> >>>>>>>>>>> The change is targeted for Spark 3.0.0 release, see >>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-28041. The current >>>>>>>>>>> thought is to bump the version to 0.23.2, but we would like to >>>>>>>>>>> discuss >>>>>>>>>>> before making a change. Does anyone else have thoughts on this? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Bryan >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Shane Knapp >>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>>>> https://rise.cs.berkeley.edu >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Shane Knapp >>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>> https://rise.cs.berkeley.edu >>>>> >>>> >>> >>> -- >>> Shane Knapp >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >