Re: [DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

Imran Rashid Fri, 24 Jul 2020 14:31:21 -0700

Hi Nupur,

Is what you're trying to do already possible via the
spark.{driver,executor}.userClassPathFirst options?


https://github.com/apache/spark/blob/b890fdc8df64f1d0b0f78b790d36be883e852b0d/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L853

On Wed, Jul 22, 2020 at 5:50 PM nupurshukla <nupur14shu...@gmail.com> wrote:

> Hello,
>
> I am prototyping a change in the behavior of spark.jars conf for my
> use-case.  spark.jars conf is used to specify a list of jars to include on
> the driver and executor classpaths.
>
> *Current behavior:*  spark.jars conf value is not read until after the JVM
> has already started and the system classloader has already loaded, and
> hence
> the jars added using this conf get “appended” to the spark classpath. This
> means that spark looks for the jar in its default classpath first and then
> looks at the path specified in spark.jars conf.
>
> *Proposed prototype:* I am proposing a new behavior where we can have
> spark.jars take precedence over spark default classpath in terms of how
> jars
> are discovered. This can be achieved by using
> spark.{driver,executor}.extraClassPath conf. This conf modifies the actual
> launch command of the driver (or executors), and hence this path is
> "prepended" to the classpath and thus takes precedence over the default
> classpath. Can the behavior of conf spark.jars be modified by adding the
> conf value of spark.jars to conf value of
> spark.{driver,executor}.extraClassPath during argument parsing in
> SparkSubmitArguments.scala
> <
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L151>
>
> , so that we can achieve precedence order of jars specified in spark.jars
> >
> spark.{driver,executor}.extraClassPath > spark default classpath (left to
> right precedence order)
>
> *Pseudo sample code:*
> In  loadEnvironmentArguments()
> <
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L151>
>
> :
>
> /if (jars != null) {
>       if (driverExtraClassPath != null) {
>         driverExtraClassPath = driverExtraClassPath + "," + jars
>       }
>       else {
>         driverExtraClassPath = jars
>       }
>     }/
>
>
> *As an example*, consider jars :
> sample-jar-1.0.0.jar present in spark’s default classpath
> sample-jar-2.0.0.jar present on all nodes of the cluster at path
> /<somepath>/
> new-jar-1.0.0.jar present on all nodes of the cluster at path /<somepath>/
> (and not in spark default classpath)
>
> And two scenarios 2 spark jobs are submitted with the following – jars conf
> values
>
> <
> http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3705/Capture.png>
>
>
>
> What are your thoughts on this? Could this have any undesired side-effects?
> Or has this already been explored and there are some known issues with this
> approach?
>
> Thanks,
> Nupur
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

Reply via email to