[
https://issues.apache.org/jira/browse/SPARK-29472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcelo Masiero Vanzin resolved SPARK-29472.
--------------------------------------------
Resolution: Won't Fix
> Mechanism for Excluding Jars at Launch for YARN
> -----------------------------------------------
>
> Key: SPARK-29472
> URL: https://issues.apache.org/jira/browse/SPARK-29472
> Project: Spark
> Issue Type: New Feature
> Components: YARN
> Affects Versions: 2.4.4
> Reporter: Abhishek Modi
> Priority: Minor
>
> *Summary*
> It would be convenient if there were an easy way to exclude jars from Spark’s
> classpath at launch time. This would complement the way in which jars can be
> added to the classpath using {{extraClassPath}}.
>
> *Context*
> The Spark build contains its dependency jars in the {{/jars}} directory.
> These jars become part of the executor’s classpath. By default on YARN, these
> jars are packaged and distributed to containers at launch ({{spark-submit}})
> time.
>
> While developing Spark applications, customers sometimes need to debug using
> different versions of dependencies. This can become difficult if the
> dependency (eg. Parquet 1.11.0) is one that Spark already has in {{/jars}}
> (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included with Spark is
> preferentially loaded.
>
> Configurations such as {{userClassPathFirst}} are available. However these
> have often come with other side effects. For example, if the customer’s build
> includes Avro they will likely see {{Caused by: java.lang.LinkageError:
> loader constraint violation: when resolving method
> "org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;"
> the class loader (instance of
> org/apache/spark/util/ChildFirstURLClassLoader) of the current class,
> com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance
> of sun/misc/Launcher$AppClassLoader) for the method's defining class,
> org/apache/spark/SparkConf, have different Class objects for the type
> scala/collection/Seq used in the signature}}. Resolving such issues often
> takes many hours.
>
> To deal with these sorts of issues, customers often download the Spark build,
> remove the target jars and then do spark-submit. Other times, customers may
> not be able to do spark-submit as it is gated behind some Spark Job Server.
> In this case, customers may try downloading the build, removing the jars, and
> then using configurations such as {{spark.yarn.dist.jars}} or
> {{spark.yarn.dist.archives}}. Both of these options are undesirable as they
> are very operationally heavy, error prone and often result in the customer’s
> spark builds going out of sync with the authoritative build.
>
> *Solution*
> I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}}
> configuration. Customers could provide a regex such as {{.\*parquet.\*}} and
> jar files matching this regex would not be included in the driver and
> executor classpath.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]