[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673591#comment-16673591 ]
shane knapp edited comment on SPARK-25079 at 11/2/18 7:26 PM: -------------------------------------------------------------- i think we're ready to deploy python3.5 (which will allow us to test pyarrow 0.10.0). i created a working python 3.5 dist on my staging worker, then set up a build that: 1) scps over a hacked python/run-tests.py that adds the python 3.5 executable to the list 2) builds spark: `./build/mvn -DskipTests -Phadoop2.7 -Pyarn -Phive -Phive-thriftserver clean package` 3) runs the python tests: `./python/run-tests` AND VOILA! IT WORKS! [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/93/console] i've already staged the python3.5 environment on all of the ubuntu workers, so here are my next steps: 1) advertise the switch-over on the dev list 2) update a bunch of stuff in the repo to point to python 3.5 [1] 3) delete the existing py3k anaconda env, then clone the 3.5 env to py3k [~bryanc] [~srowen] [~yhuai] [1]: {noformat} ➜ spark git:(master) grep -rw "python3\.4" * core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: launcher.setConf(SparkLauncher.PYSPARK_DRIVER_PYTHON, "python3.4"); core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: assertEquals("python3.4", launcher.builder.conf.get( core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: "--conf", "spark.pyspark.driver.python=python3.4", core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: conf3.get(PYSPARK_DRIVER_PYTHON.key) should be ("python3.4") docs/rdd-programming-guide.md:$ PYSPARK_PYTHON=python3.4 bin/pyspark python/run-tests.py: python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)] ➜ spark git:(master) grep -r "py3k" * dev/run-tests.py: os.environ["PATH"] = "/home/anaconda/envs/py3k/bin:" + os.environ.get("PATH"){noformat} was (Author: shaneknapp): i think we're ready to deploy python3.5 (which will allow us to test pyarrow 0.10.0). i created a working python 3.5 dist on my staging worker, then set up a build that: 1) scps over a hacked python/run-tests.py that adds the python 3.5 executable to the list 2) builds spark: `./build/mvn -DskipTests -Phadoop2.7 -Pyarn -Phive -Phive-thriftserver clean package` 3) runs the python tests: `./python/run-tests` AND VOILA! IT WORKS! [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/93/console] https://amplab.cs.berkeley.edu/jenkins/job/ubuntuSparkPRB/116/console i've already staged the python3.5 environment on all of the ubuntu workers, so here are my next steps: 1) advertise the switch-over on the dev list 2) update a bunch of stuff in the repo to point to python 3.5 [1] 3) delete the existing py3k anaconda env, then clone the 3.5 env to py3k [~bryanc] [~srowen] [~yhuai] [1]: {noformat} ➜ spark git:(master) grep -rw "python3\.4" * core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: launcher.setConf(SparkLauncher.PYSPARK_DRIVER_PYTHON, "python3.4"); core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: assertEquals("python3.4", launcher.builder.conf.get( core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: "--conf", "spark.pyspark.driver.python=python3.4", core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: conf3.get(PYSPARK_DRIVER_PYTHON.key) should be ("python3.4") docs/rdd-programming-guide.md:$ PYSPARK_PYTHON=python3.4 bin/pyspark python/run-tests.py: python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)] ➜ spark git:(master) grep -r "py3k" * dev/run-tests.py: os.environ["PATH"] = "/home/anaconda/envs/py3k/bin:" + os.environ.get("PATH"){noformat} > [PYTHON] upgrade python 3.4 -> 3.5 > ---------------------------------- > > Key: SPARK-25079 > URL: https://issues.apache.org/jira/browse/SPARK-25079 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark > Affects Versions: 2.3.1 > Reporter: shane knapp > Assignee: shane knapp > Priority: Major > > for the impending arrow upgrade > (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python > 3.4 -> 3.5. > i have been testing this here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69] > my methodology: > 1) upgrade python + arrow to 3.5 and 0.10.0 > 2) run python tests > 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and > upgrade centos workers to python3.5 > 4) simultaneously do the following: > - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that > points to python3.5 (this is currently being tested here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)] > - push a change to python/run-tests.py replacing 3.4 with 3.5 > 5) once the python3.5 change to run-tests.py is merged, we will need to > back-port this to all existing branches > 6) then and only then can i remove the python3.4 -> python3.5 symlink -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org