Serena Ruan created SPARK-39779:
-----------------------------------

             Summary: Support adding maven packages while pip
                 Key: SPARK-39779
                 URL: https://issues.apache.org/jira/browse/SPARK-39779
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 3.3.0
            Reporter: Serena Ruan


The goal is to support adding maven packages to a pip installable package, 
including adding jars and resolvers, so that when spark gets booted up it can 
automatically look for the maven packages in the classpath and install 
corresponding dependencies.

 

This idea comes up because currently for a python package, which depends on 
jars like pyspark internally use reflection on spark source code, if we want to 
make it work, there're two steps: 1. pip install the python package. 2. Add the 
jar into spark configuration while we start spark session, for example through 
spark.jars.packages.

If we can support the proposed functionality, we could ideally just add the 
package name and resolver while we pip install the package, and when spark 
session starts, it can look for those configurations inside python classpath 
and install them if they're not existed. This will simplify the process of all 
python developers who internally depends on maven packages and make pyspark 
more user-friendly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to