[ 
https://issues.apache.org/jira/browse/SPARK-39779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serena Ruan updated SPARK-39779:
--------------------------------
    Shepherd: Mark Hamilton

> Support adding maven packages while pip
> ---------------------------------------
>
>                 Key: SPARK-39779
>                 URL: https://issues.apache.org/jira/browse/SPARK-39779
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: Serena Ruan
>            Priority: Major
>              Labels: maven, package, pyspark, python
>
> The goal is to support adding maven packages to a pip installable package, 
> including adding jars and resolvers, so that when spark gets booted up it can 
> automatically look for the maven packages in the classpath and install 
> corresponding dependencies.
>  
> This idea comes up because currently for a python package, which depends on 
> jars like pyspark internally use reflection on spark source code, if we want 
> to make it work, there're two steps: 1. pip install the python package. 2. 
> Add the jar into spark configuration while we start spark session, for 
> example through spark.jars.packages.
> If we can support the proposed functionality, we could ideally just add the 
> package name and resolver while we pip install the package, and when spark 
> session starts, it can look for those configurations inside python classpath 
> and install them if they're not existed. This will simplify the process of 
> all python developers who internally depends on maven packages and make 
> pyspark more user-friendly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to