excellent. i shall not touch anything. :) On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler <[email protected]> wrote:
> Shane, I think 0.24.2 is probably more common right now, so if we were to > pick one to test against, I still think it should be that one. Our Pandas > usage in PySpark is pretty conservative, so it's pretty unlikely that we > will add something that would break 0.23.X. > > On Fri, Jun 14, 2019 at 10:10 AM shane knapp <[email protected]> wrote: > >> ah, ok... should we downgrade the testing env on jenkins then? any >> specific version? >> >> shane, who is loathe (and i mean LOATHE) to touch python envs ;) >> >> On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler <[email protected]> wrote: >> >>> I should have stated this earlier, but when the user does something that >>> requires Pandas, the minimum version is checked against what was imported >>> and will raise an exception if it is a lower version. So I'm concerned that >>> using 0.24.2 might be a little too new for users running older clusters. To >>> give some release dates, 0.23.2 was released about a year ago, 0.24.0 in >>> January and 0.24.2 in March. >>> >>> On Fri, Jun 14, 2019 at 9:27 AM shane knapp <[email protected]> wrote: >>> >>>> just to everyone knows, our python 3.6 testing infra is currently on >>>> 0.24.2... >>>> >>>> On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun <[email protected]> >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> Thank you for this effort, Bryan! >>>>> >>>>> Bests, >>>>> Dongjoon. >>>>> >>>>> On Fri, Jun 14, 2019 at 4:24 AM Holden Karau <[email protected]> >>>>> wrote: >>>>> >>>>>> I’m +1 for upgrading, although since this is probably the last easy >>>>>> chance we’ll have to bump version numbers easily I’d suggest 0.24.2 >>>>>> >>>>>> >>>>>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow >>>>>>> and pandas combinations. Spark 3 should be good time to increase. >>>>>>> >>>>>>> 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler <[email protected]>님이 작성: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> We would like to discuss increasing the minimum supported version >>>>>>>> of Pandas in Spark, which is currently 0.19.2. >>>>>>>> >>>>>>>> Pandas 0.19.2 was released nearly 3 years ago and there are some >>>>>>>> workarounds in PySpark that could be removed if such an old version is >>>>>>>> not >>>>>>>> required. This will help to keep code clean and reduce maintenance >>>>>>>> effort. >>>>>>>> >>>>>>>> The change is targeted for Spark 3.0.0 release, see >>>>>>>> https://issues.apache.org/jira/browse/SPARK-28041. The current >>>>>>>> thought is to bump the version to 0.23.2, but we would like to discuss >>>>>>>> before making a change. Does anyone else have thoughts on this? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Bryan >>>>>>>> >>>>>>> -- >>>>>> Twitter: https://twitter.com/holdenkarau >>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>> >>>>> >>>> >>>> -- >>>> Shane Knapp >>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>> https://rise.cs.berkeley.edu >>>> >>> >> >> -- >> Shane Knapp >> UC Berkeley EECS Research / RISELab Staff Technical Lead >> https://rise.cs.berkeley.edu >> > -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu
