[ 
https://issues.apache.org/jira/browse/SPARK-29472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959179#comment-16959179
 ] 

Marcelo Masiero Vanzin commented on SPARK-29472:
------------------------------------------------

bq. customers sometimes need to debug using different versions of dependencies

That's trivial to do with Spark-on-YARN.

{code}
spark-submit --deploy-mode cluster \
  --files /path/to/my-custom-parquet.jar \
  --conf spark.driver.extraClassPath=my-custom-parquet.jar \
  --conf spark.executor.extraClassPath=my-custom-parquet.jar
{code}

Or in client mode:

{code}
spark-submit --deploy-mode cluster \
  --files /path/to/my-custom-parquet.jar \
  --conf spark.driver.extraClassPath=/path/to/my-custom-parquet.jar \
  --conf spark.executor.extraClassPath=my-custom-parquet.jar
{code}

Done. No need for a new option, no need to change Spark's install directory, no 
need for {{userClassPathFirst}} or anything. I don't see the point of adding 
the new option - it's confusing, easy to break things, and doesn't completely 
solve the problem by itself, since you still have to upload the new jar and add 
it to the class path with other existing options.

> Mechanism for Excluding Jars at Launch for YARN
> -----------------------------------------------
>
>                 Key: SPARK-29472
>                 URL: https://issues.apache.org/jira/browse/SPARK-29472
>             Project: Spark
>          Issue Type: New Feature
>          Components: YARN
>    Affects Versions: 2.4.4
>            Reporter: Abhishek Modi
>            Priority: Minor
>
> *Summary*
> It would be convenient if there were an easy way to exclude jars from Spark’s 
> classpath at launch time. This would complement the way in which jars can be 
> added to the classpath using {{extraClassPath}}.
>  
> *Context*
> The Spark build contains its dependency jars in the {{/jars}} directory. 
> These jars become part of the executor’s classpath. By default on YARN, these 
> jars are packaged and distributed to containers at launch ({{spark-submit}}) 
> time.
>  
> While developing Spark applications, customers sometimes need to debug using 
> different versions of dependencies. This can become difficult if the 
> dependency (eg. Parquet 1.11.0) is one that Spark already has in {{/jars}} 
> (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included with Spark is 
> preferentially loaded. 
>  
> Configurations such as {{userClassPathFirst}} are available. However these 
> have often come with other side effects. For example, if the customer’s build 
> includes Avro they will likely see {{Caused by: java.lang.LinkageError: 
> loader constraint violation: when resolving method 
> "org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;"
>  the class loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) of the current class, 
> com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance 
> of sun/misc/Launcher$AppClassLoader) for the method's defining class, 
> org/apache/spark/SparkConf, have different Class objects for the type 
> scala/collection/Seq used in the signature}}. Resolving such issues often 
> takes many hours.
>  
> To deal with these sorts of issues, customers often download the Spark build, 
> remove the target jars and then do spark-submit. Other times, customers may 
> not be able to do spark-submit as it is gated behind some Spark Job Server. 
> In this case, customers may try downloading the build, removing the jars, and 
> then using configurations such as {{spark.yarn.dist.jars}} or 
> {{spark.yarn.dist.archives}}. Both of these options are undesirable as they 
> are very operationally heavy, error prone and often result in the customer’s 
> spark builds going out of sync with the authoritative build. 
>  
> *Solution*
> I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}} 
> configuration. Customers could provide a regex such as {{.\*parquet.\*}} and 
> jar files matching this regex would not be included in the driver and 
> executor classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to