[ 
https://issues.apache.org/jira/browse/SPARK-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281603#comment-15281603
 ] 

Sean Owen commented on SPARK-14638:
-----------------------------------

These are always pretty tricky to reason about. Generally the problem is 
something is in your app classloader but not Spark's; putting it (also) in 
Spark's classloader via the classpath can often "work" even if it's also 
potentially messy.

You should find there is a cause to the failure to initialize, like somewhere 
it will say what's not found.

I would consider first whether you're shipping, for instance, HBase code you 
don't need to.


> Spark task does not have access to a dependency in the classloader of the 
> executor thread
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-14638
>                 URL: https://issues.apache.org/jira/browse/SPARK-14638
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.1, 1.4.1, 1.6.0, 1.6.1
>         Environment: > uname -a
> Linux HOSTNAME 3.13.0-74-generic #118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015 
> x86_64 x86_64 x86_64 GNU/Linux
> > java -version
> java version "1.8.0_77"
> Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
>            Reporter: Younos Aboulnaga
>
> We have started to frequently see Spark apps failing because of a 
> NoClassDefFoundError thrown despite that the dependency had been added to the 
> ClassLoader just before it was thrown. The [Executor.run method adds the 
> JAR|https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/executor/Executor.scala#L193]
>  containing the class but then a NoClassDefFoundError is thrown subsequently. 
> We see log messages from 
> [updateDependencies|https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/executor/Executor.scala#L386]
>  indicating that the JAR is fetched and added to the class loader. Upon 
> inspection of the worker dir, the JAR is there, it is not corrupted, and it 
> contains the class that could not be found in the class loader. 
> We first saw this when we started writing streaming apps, and we thought it 
> is something specific to streaming apps. However, this was wrong as the same 
> problem happened with several batch apps. 
> We first saw this on a Standalone cluster, and we though that it might be a 
> problem caused by the lack of resource manager. Now we installed Mesos and 
> the problem still happens. 
> I tried to create a POC Spark App that demonstrates the problem, but I 
> couldn't reliably reproduce it. The problem would still happen in other apps, 
> but it didn't happen in the POC app even though I made it structurally the 
> same as any other app we run. The problem seems to be environmental, 
> specially because we found a work around for it.
> The work around we found is setting SPARK_CLASSPATH *on the executor nodes* 
> to a local copy of the dependency. The problem still happens if we set the 
> 'spark.executor.extraClassPath' or 'spark.driver.extraClassPath' or set 
> SPARK_CLASSPATH on the driver node. However, if the SPARK_CLASSPATH is set on 
> the executor node, then the problem doesn't happen because the JAR doesn't 
> need to be added to the class loader by Executor#updateDependencies.
> Other symptoms of the problem are the following:
> 1) Even though there is a 'log4j.properties' in the 
> 'spark.executor.extraClassPath', the first line of the stderr of the worker 
> says "Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties" The log4j.properties file that is 
> shipped with the job is totally neglected. 
> 2) Any configuration files on 'spark.executor.extraClassPath' are neglected. 
> I am mentioning this because log4j.properties is loaded very early on and in 
> a static call, which might sway the troubleshooting into wrong directions.
> Here is the specific example in our case:
> > grep NoClassDef workers/app-20160414111328-0043/0/stderr
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil
> .. SEVERAL ATTEMPTS ...
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil 
> Even though, in the same application worker dir:
> > for j in workers/app-20160414111328-0043/0/*.jar ; do jar tf $j | grep 
> > ProtobufUtil ; done;
> org/apache/hadoop/hbase/protobuf/ProtobufUtil$1.class
> org/apache/hadoop/hbase/protobuf/ProtobufUtil.class
> There are other examples, specially for configurations not being found. I 
> think the SPARK-12279 can also be caused by  the same root cause.
> We have been seeing this in several of our clusters and several engineers had 
> spent days looking into why their applications suffer from this. We rebuilt 
> our infrastructure (always on AWS EC2 nodes) and tested many hypotheses, 
> including things that are non-sensical, and we still can't find anything that 
> reliably reproduces the problem. The only reliable piece of information is 
> that setting SPARK_CLASSPATH *on the executor nodes* prevents the problem 
> from happening, because then the dependencies are included in the -cp 
> parameter of the java command running the CoarseGrainedExecutorBackend .
> We would appreciate if someone more knowledgeable in Spark internals take a 
> look, and we can help by providing as much details as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to