[ 
https://issues.apache.org/jira/browse/SPARK-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-9035.
---------------------------------
    Resolution: Incomplete

> Spark on Mesos Thread Context Class Loader issues
> -------------------------------------------------
>
>                 Key: SPARK-9035
>                 URL: https://issues.apache.org/jira/browse/SPARK-9035
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.2, 1.3.0, 1.3.1, 1.4.0
>         Environment: Mesos on MapRFS. 
>            Reporter: John Omernik
>            Priority: Critical
>              Labels: bulk-closed
>
> There is an issue trying to run Spark on Mesos (using MapRFS).   I am able to 
> run this in YARN (Using Myriad on Mesos) on the same cluster, just not 
> directly on Mesos. I've corresponded with MapR and the issue appears to be 
> the class loader being NULL.  They will look at trying to address it in their 
> code as well, but the issue exists here as the desired behavior shouldn't be 
> to pass NULL (see https://issues.apache.org/jira/browse/SPARK-1403)  Note, I 
> did try to work to reopen SPARK-1403 and Patrick Wendell asked me to open a 
> new issue, (that is this JIRA).
> Environment:
> MapR 4.1.0 (using MapRFS)
> Mesos 22.1 
> Spark 1.4 (The issue occurs on Spark 1.3.1, 1.3.0, 1.2.2 but not 1.2.0)
> Some comments from Kannan at MapR (he is no longer with MapR, these comments 
> were prior to him leaving:
> Here is the corresponding ShimLoader code. cl.getParent is hitting NPE. 
> If you look at Spark code base, you can see that the setContextClassLoader is 
> invoked in a few places, but not necessarily in the context of this stack 
> trace.
> {code}
>   private static ClassLoader getRootClassLoader() {
>     ClassLoader cl = Thread.currentThread().getContextClassLoader();
>     trace("getRootClassLoader: thread classLoader is '%s'",
>           cl.getClass().getCanonicalName());
>     while (cl.getParent() != null) {
>       cl = cl.getParent();
>     }
>     trace("getRootClassLoader: root classLoader is '%s'",
>           cl.getClass().getCanonicalName());
>     return cl;
>   }
> {code}
>   MapR cannot handle NULL in this case. Basically, it is trying to get a root 
> classloader to use for loading a bunch of classes. It uses the thread's 
> context class loader (TCCL) and keeps going up the parent chain. We could 
> fall back to using the current class's classloader whenever TCCL is NULL. I 
> need to check with some folks what the impact will be. I don't know the 
> specific reason for choosing the TCCL here.
>   I have raised an internal bug to fall back to using the current class 
> loader if the TCCL is not set. Let us also figure out if there is a way for 
> Spark to address this - if it is really a change in behavior from their side. 
> I think we should still fix out code to not make this assumption. But since 
> this is a core change, it may not get out soon.
> Command Attempted in bin/pyspark
> {code}
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SQLContext, Row, HiveContext
> sparkhc = HiveContext(sc)
> test = sparkhc.sql("show tables")
> for r in test.collect():
>   print r
> {code}
> Stack Trace from CLI:
> {code}
> 15/07/14 09:16:40 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://[email protected]:58221] has failed, 
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 15/07/14 09:16:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
> hadoopvm5.mydomain.com): ExecutorLostFailure (executor 
> 20150630-193234-1644210368-5050-10591-S3 lost)
> 15/07/14 09:16:48 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://[email protected]:53763] has failed, 
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 15/07/14 09:16:48 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, 
> hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 
> 20150630-193234-1644210368-5050-10591-S2 lost)
> 15/07/14 09:16:53 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://[email protected]:52102] has failed, 
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 15/07/14 09:16:53 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2, 
> hadoopvm5.mydomain.com): ExecutorLostFailure (executor 
> 20150630-193234-1644210368-5050-10591-S3 lost)
> 15/07/14 09:17:01 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://[email protected]:58600] has failed, 
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 15/07/14 09:17:01 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3, 
> hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 
> 20150630-193234-1644210368-5050-10591-S2 lost)
> 15/07/14 09:17:01 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; 
> aborting job
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File 
> "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/pyspark/sql/dataframe.py",
>  line 314, in collect
>     port = 
> self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd())
>   File 
> "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 
> 20150630-193234-1644210368-5050-10591-S2 lost)
> Driver stacktrace:
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
>       at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>       at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>       at scala.Option.foreach(Option.scala:236)
>       at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
>       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> Stack Trace from STDERR on Failed Mesos Task:
> {code}
> I0714 09:16:31.665690 21429 fetcher.cpp:214] Fetching URI 
> '/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503.tgz'
> I0714 09:16:31.665841 21429 fetcher.cpp:194] Copying resource from 
> '/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503.tgz' to 
> '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f'
> I0714 09:16:35.624750 21429 fetcher.cpp:78] Extracted resource 
> '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f/spark-1.4.0-bin-2.5.1-mapr-1503.tgz'
>  into 
> '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f'
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 15/07/14 09:16:39 INFO MesosExecutorBackend: Registered signal handlers for 
> [TERM, HUP, INT]
> I0714 09:16:39.139713 21504 exec.cpp:132] Version: 0.22.1
> I0714 09:16:39.147428 21525 exec.cpp:206] Executor registered on slave 
> 20150630-193234-1644210368-5050-10591-S3
> 15/07/14 09:16:39 INFO MesosExecutorBackend: Registered with Mesos as 
> executor ID 20150630-193234-1644210368-5050-10591-S3 with 1 cpus
> 15/07/14 09:16:39 INFO SecurityManager: Changing view acls to: darkness
> 15/07/14 09:16:39 INFO SecurityManager: Changing modify acls to: darkness
> 15/07/14 09:16:39 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(darkness); users 
> with modify permissions: Set(darkness)
> 15/07/14 09:16:39 INFO Slf4jLogger: Slf4jLogger started
> 15/07/14 09:16:39 INFO Remoting: Starting remoting
> 15/07/14 09:16:39 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://[email protected]:58221]
> 15/07/14 09:16:39 INFO Utils: Successfully started service 'sparkExecutor' on 
> port 58221.
> 15/07/14 09:16:39 INFO DiskBlockManager: Created local directory at 
> /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42/blockmgr-54a629d3-5583-4c05-b478-dfee1ad5a113
> 15/07/14 09:16:39 INFO MemoryStore: MemoryStore started with capacity 1060.0 
> MB
> java.lang.NullPointerException
>       at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
>       at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
>       at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
>       at 
> org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Class.java:348)
>       at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
>       at 
> org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
>       at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
>       at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
>       at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
>       at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
>       at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
>       at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
>       at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
>       at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
>       at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
>       at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
>       at 
> org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
> java.lang.RuntimeException: Failure loading MapRClient. 
>       at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:296)
>       at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
>       at 
> org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Class.java:348)
>       at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
>       at 
> org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
>       at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
>       at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
>       at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
>       at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
>       at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
>       at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
>       at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
>       at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
>       at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
>       at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
>       at 
> org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
> Caused by: java.lang.NullPointerException
>       at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
>       at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
>       ... 21 more
> java.lang.ExceptionInInitializerError
>       at com.mapr.fs.ShimLoader.load(ShimLoader.java:227)
>       at 
> org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Class.java:348)
>       at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
>       at 
> org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
>       at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
>       at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
>       at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
>       at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
>       at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
>       at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
>       at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
>       at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
>       at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
>       at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
>       at 
> org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
> Caused by: java.lang.RuntimeException: Failure loading MapRClient. 
>       at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:296)
>       at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
>       ... 20 more
> Caused by: java.lang.NullPointerException
>       at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
>       at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
>       ... 21 more
> Exception in thread "Thread-2" I0714 09:16:40.007040 21525 exec.cpp:413] 
> Deactivating the executor libprocess
> 15/07/14 09:16:40 INFO DiskBlockManager: Shutdown hook called
> 15/07/14 09:16:40 INFO Utils: path = 
> /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42/blockmgr-54a629d3-5583-4c05-b478-dfee1ad5a113,
>  already present as root for deletion.
> 15/07/14 09:16:40 INFO Utils: Shutdown hook called
> 15/07/14 09:16:40 INFO Utils: Deleting directory 
> /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to