Serena Ruan created SPARK-39779:
-----------------------------------
Summary: Support adding maven packages while pip
Key: SPARK-39779
URL: https://issues.apache.org/jira/browse/SPARK-39779
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 3.3.0
Reporter: Serena Ruan
The goal is to support adding maven packages to a pip installable package,
including adding jars and resolvers, so that when spark gets booted up it can
automatically look for the maven packages in the classpath and install
corresponding dependencies.
This idea comes up because currently for a python package, which depends on
jars like pyspark internally use reflection on spark source code, if we want to
make it work, there're two steps: 1. pip install the python package. 2. Add the
jar into spark configuration while we start spark session, for example through
spark.jars.packages.
If we can support the proposed functionality, we could ideally just add the
package name and resolver while we pip install the package, and when spark
session starts, it can look for those configurations inside python classpath
and install them if they're not existed. This will simplify the process of all
python developers who internally depends on maven packages and make pyspark
more user-friendly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]