Maziyar PANAHI created ZEPPELIN-3986:
----------------------------------------

             Summary: Cannot access any JAR in yarn cluster mode
                 Key: ZEPPELIN-3986
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3986
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.8.1, 0.8.2
         Environment: Cloudera/CDH 6.1

Spark 2.4

Hadoop 3.0

Zeppelin 0.8.2 (built from the latest merged pull request)
            Reporter: Maziyar PANAHI


Hello,

YARN cluster mode was introduced in `0.8.0` and fixed for not finding 
ZeppelinContext in `0.8.1`. However, I have difficulties to access any JAR in 
order to `import` them inside my notebook.

I have a CDH cluster, where everything works in deployMode `client`, but the 
moment I switch to `cluster` and the driver is not the same machine as Zeppelin 
server it can't find the packages.

Working configs:

Inside interpreter:

master: yarn

spark.submit.deployMode: client

Inside `zeppelin-env.sh`:

 
{code:java}
export SPARK_SUBMIT_OPTIONS="--jars 
hdfs:///user/maziyar/jars/zeppelin/graphframes/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar
{code}
 

Since the JAR is already on HDFS, switching to `cluster` should be as simple as 
changing `spark.submit.deployMode` to the cluster. However, doing that results 
in:

 
{code:java}
import org.graphframes._

<console>:23: error: object graphframes is not a member of package org import 
org.graphframes._
{code}
I can see my JAR in Spark UI in `spark.yarn.dist.jars` and 
`spark.yarn.secondary.jars` in both cluster and client mode.

 

In client mode `sc.jars` will result:

 
{code:java}
res2: Seq[String] = 
List(file:/opt/zeppelin-0.8.2-new/interpreter/spark/spark-interpreter-0.8.2-SNAPSHOT.jar){code}
 

However, in `cluster` mode the same command is empty. I thought maybe there is 
something extra or missing on Zeppelin Spark Interpreter that doesn't not allow 
the JAR being used in cluster mode.

 

This is how Spark UI reports my JAR in `client` mode:

 

 

 

 
|spark.repl.local.jars 
|file:/tmp/spark-3aadfe3c-8821-4dfe-875b-744c2e35a95a/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|spark.yarn.dist.jars 
|hdfs://hadoop-master-1:8020/user/mpanahi/jars/zeppelin/graphframes/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|spark.yarn.secondary.jars|graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|sun.java.command|org.apache.spark.deploy.SparkSubmit --master yarn --conf 
spark.executor.memory=5g --conf spark.driver.memory=8g --conf 
spark.driver.cores=4 --conf spark.yarn.isPython=true --conf 
spark.driver.extraClassPath=:/opt/zeppelin-0.8.2-new/interpreter/spark/*:/opt/zeppelin-0.8.2-new/zeppelin-interpreter/target/lib/*::/opt/zeppelin-0.8.2-new/zeppelin-interpreter/target/classes:/opt/zeppelin-0.8.2-new/zeppelin-interpreter/target/test-classes:/opt/zeppelin-0.8.2-new/zeppelin-zengine/target/test-classes:/opt/zeppelin-0.8.2-new/interpreter/spark/spark-interpreter-0.8.2-SNAPSHOT.jar
 --conf spark.useHiveContext=true --conf spark.app.name=Zeppelin --conf 
spark.executor.cores=5 --conf spark.submit.deployMode=client --conf 
spark.dynamicAllocation.maxExecutors=50 --conf 
spark.dynamicAllocation.initialExecutors=1 --conf 
spark.dynamicAllocation.enabled=true --conf spark.driver.extraJavaOptions= 
-Dfile.encoding=UTF-8 
-Dlog4j.configuration=file:///opt/zeppelin-0.8.2-new/conf/log4j.properties 
-Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-mpanahi-zeppelin-hadoop-gateway.log
 --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer --jars 
hdfs:///user/mpanahi/jars/zeppelin/graphframes/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar,|

 

This is how Spark UI reports my JAR in `cluster` mode (same configs as I 
mentioned above):

  
|spark.repl.local.jars |This field does not exist in cluster mode|
|spark.yarn.dist.jars 
|hdfs://hadoop-master-1:8020/user/mpanahi/jars/zeppelin/graphframes/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|spark.yarn.secondary.jars|graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|sun.java.command|org.apache.spark.deploy.yarn.ApplicationMaster --class 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer --jar 
file:/opt/zeppelin-0.8.2-new/interpreter/spark/spark-interpreter-0.8.2-SNAPSHOT.jar
 --arg 134.158.74.122 --arg 46130 --arg : --properties-file 
/yarn/nm/usercache/mpanahi/appcache/application_1547731772080_0077/container_1547731772080_0077_01_000001/__spark_conf__/__spark_conf__.properties|

 

Thank you.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to