[ https://issues.apache.org/jira/browse/SPARK-39995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600297#comment-17600297 ]
Oleksandr Shevchenko commented on SPARK-39995: ---------------------------------------------- It definitely matters. It impacts the dependencies/packages we can use (e.g. DataSourceV2 API implementation for read and write). It impacts DX (Developer Experience) and installation including CD process for our code. > PySpark installation doesn't support Scala 2.13 binaries > -------------------------------------------------------- > > Key: SPARK-39995 > URL: https://issues.apache.org/jira/browse/SPARK-39995 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.3.0 > Reporter: Oleksandr Shevchenko > Priority: Major > > [PyPi|https://pypi.org/project/pyspark/] doesn't support Spark binary > [installation|https://spark.apache.org/docs/latest/api/python/getting_started/install.html#using-pypi] > for Scala 2.13. > Currently, the setup > [script|https://github.com/apache/spark/blob/master/python/pyspark/install.py] > allows to set versions of Spark, Hadoop (PYSPARK_HADOOP_VERSION), and mirror > (PYSPARK_RELEASE_MIRROR) to download needed Spark binaries, but it's always > Scala 2.12 compatible binaries. There isn't any parameter to download > "spark-3.3.0-bin-hadoop3-scala2.13.tgz". > It's possible to download Spark manually and set the needed SPARK_HOME, but > it's hard to use with pip or Poetry. > Also, env vars (e.g. PYSPARK_HADOOP_VERSION) are easy to use with pip and CLI > but not possible with package managers like Poetry. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org