[
https://issues.apache.org/jira/browse/SPARK-39779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Serena Ruan updated SPARK-39779:
--------------------------------
Shepherd: Mark Hamilton
> Support adding maven packages while pip
> ---------------------------------------
>
> Key: SPARK-39779
> URL: https://issues.apache.org/jira/browse/SPARK-39779
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.3.0
> Reporter: Serena Ruan
> Priority: Major
> Labels: maven, package, pyspark, python
>
> The goal is to support adding maven packages to a pip installable package,
> including adding jars and resolvers, so that when spark gets booted up it can
> automatically look for the maven packages in the classpath and install
> corresponding dependencies.
>
> This idea comes up because currently for a python package, which depends on
> jars like pyspark internally use reflection on spark source code, if we want
> to make it work, there're two steps: 1. pip install the python package. 2.
> Add the jar into spark configuration while we start spark session, for
> example through spark.jars.packages.
> If we can support the proposed functionality, we could ideally just add the
> package name and resolver while we pip install the package, and when spark
> session starts, it can look for those configurations inside python classpath
> and install them if they're not existed. This will simplify the process of
> all python developers who internally depends on maven packages and make
> pyspark more user-friendly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]