[ https://issues.apache.org/jira/browse/SPARK-38438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rafal Wojdyla updated SPARK-38438: ---------------------------------- Description: Reproduction: {code:python} from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() # later on we want to update jars.packages, here's e.g. spark-hats s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # line below returns None, the config was not propagated: s._sc._conf.get("spark.jars.packages") {code} Stopping the context doesn't help, in fact it's even more confusing, because the configuration is updated, but doesn't have an effect: {code:python} from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() s.stop() s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context # doesn't download the jar/package, as it would if there was no global context # thus the extra package is unusable. It's not downloaded, or added to the # classpath. s._sc._conf.get("spark.jars.packages") {code} One workaround is to stop the context AND kill the JVM gateway, which seems to be a kind of hard reset: {code:python} from pyspark import SparkContext from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() # Hard reset: s.stop() s._sc._gateway.shutdown() SparkContext._gateway = None SparkContext._jvm = None s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # Now we are guaranteed there's a new spark session, and packages # are downloaded, added to the classpath etc. {code} was: Reproduction: {code:python} from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() # later on we want to update jars.packages, here's e.g. spark-hats s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # line below return None, the config was not propagated: s._sc._conf.get("spark.jars.packages") {code} Stopping the context doesn't help, in fact it's even more confusing, because the configuration is updated, but doesn't have an effect: {code:python} from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() s.stop() s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context # doesn't download the jar/package, as it would if there was no global context # thus the extra package is unusable. It's not downloaded, or added to the # classpath. s._sc._conf.get("spark.jars.packages") {code} One workaround is to stop the context AND kill the JVM gateway, which seems to be a kind of hard reset: {code:python} from pyspark import SparkContext from pyspark.sql import SparkSession # default session: s = SparkSession.builder.getOrCreate() # Hard reset: s.stop() s._sc._gateway.shutdown() SparkContext._gateway = None SparkContext._jvm = None s = (SparkSession.builder .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") .getOrCreate()) # Now we are guaranteed there's a new spark session, and packages # are downloaded, added to the classpath etc. {code} > Can't update spark.jars.packages on existing global/default context > ------------------------------------------------------------------- > > Key: SPARK-38438 > URL: https://issues.apache.org/jira/browse/SPARK-38438 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core > Affects Versions: 3.2.1 > Environment: py: 3.9 > spark: 3.2.1 > Reporter: Rafal Wojdyla > Priority: Major > > Reproduction: > {code:python} > from pyspark.sql import SparkSession > # default session: > s = SparkSession.builder.getOrCreate() > # later on we want to update jars.packages, here's e.g. spark-hats > s = (SparkSession.builder > .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") > .getOrCreate()) > # line below returns None, the config was not propagated: > s._sc._conf.get("spark.jars.packages") > {code} > Stopping the context doesn't help, in fact it's even more confusing, because > the configuration is updated, but doesn't have an effect: > {code:python} > from pyspark.sql import SparkSession > # default session: > s = SparkSession.builder.getOrCreate() > s.stop() > s = (SparkSession.builder > .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") > .getOrCreate()) > # now this line returns 'za.co.absa:spark-hats_2.12:0.2.2', but the context > # doesn't download the jar/package, as it would if there was no global context > # thus the extra package is unusable. It's not downloaded, or added to the > # classpath. > s._sc._conf.get("spark.jars.packages") > {code} > One workaround is to stop the context AND kill the JVM gateway, which seems > to be a kind of hard reset: > {code:python} > from pyspark import SparkContext > from pyspark.sql import SparkSession > # default session: > s = SparkSession.builder.getOrCreate() > # Hard reset: > s.stop() > s._sc._gateway.shutdown() > SparkContext._gateway = None > SparkContext._jvm = None > s = (SparkSession.builder > .config("spark.jars.packages", "za.co.absa:spark-hats_2.12:0.2.2") > .getOrCreate()) > # Now we are guaranteed there's a new spark session, and packages > # are downloaded, added to the classpath etc. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org