[
https://issues.apache.org/jira/browse/SPARK-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-9035.
---------------------------------
Resolution: Incomplete
> Spark on Mesos Thread Context Class Loader issues
> -------------------------------------------------
>
> Key: SPARK-9035
> URL: https://issues.apache.org/jira/browse/SPARK-9035
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.2, 1.3.0, 1.3.1, 1.4.0
> Environment: Mesos on MapRFS.
> Reporter: John Omernik
> Priority: Critical
> Labels: bulk-closed
>
> There is an issue trying to run Spark on Mesos (using MapRFS). I am able to
> run this in YARN (Using Myriad on Mesos) on the same cluster, just not
> directly on Mesos. I've corresponded with MapR and the issue appears to be
> the class loader being NULL. They will look at trying to address it in their
> code as well, but the issue exists here as the desired behavior shouldn't be
> to pass NULL (see https://issues.apache.org/jira/browse/SPARK-1403) Note, I
> did try to work to reopen SPARK-1403 and Patrick Wendell asked me to open a
> new issue, (that is this JIRA).
> Environment:
> MapR 4.1.0 (using MapRFS)
> Mesos 22.1
> Spark 1.4 (The issue occurs on Spark 1.3.1, 1.3.0, 1.2.2 but not 1.2.0)
> Some comments from Kannan at MapR (he is no longer with MapR, these comments
> were prior to him leaving:
> Here is the corresponding ShimLoader code. cl.getParent is hitting NPE.
> If you look at Spark code base, you can see that the setContextClassLoader is
> invoked in a few places, but not necessarily in the context of this stack
> trace.
> {code}
> private static ClassLoader getRootClassLoader() {
> ClassLoader cl = Thread.currentThread().getContextClassLoader();
> trace("getRootClassLoader: thread classLoader is '%s'",
> cl.getClass().getCanonicalName());
> while (cl.getParent() != null) {
> cl = cl.getParent();
> }
> trace("getRootClassLoader: root classLoader is '%s'",
> cl.getClass().getCanonicalName());
> return cl;
> }
> {code}
> MapR cannot handle NULL in this case. Basically, it is trying to get a root
> classloader to use for loading a bunch of classes. It uses the thread's
> context class loader (TCCL) and keeps going up the parent chain. We could
> fall back to using the current class's classloader whenever TCCL is NULL. I
> need to check with some folks what the impact will be. I don't know the
> specific reason for choosing the TCCL here.
> I have raised an internal bug to fall back to using the current class
> loader if the TCCL is not set. Let us also figure out if there is a way for
> Spark to address this - if it is really a change in behavior from their side.
> I think we should still fix out code to not make this assumption. But since
> this is a core change, it may not get out soon.
> Command Attempted in bin/pyspark
> {code}
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SQLContext, Row, HiveContext
> sparkhc = HiveContext(sc)
> test = sparkhc.sql("show tables")
> for r in test.collect():
> print r
> {code}
> Stack Trace from CLI:
> {code}
> 15/07/14 09:16:40 WARN ReliableDeliverySupervisor: Association with remote
> system [akka.tcp://[email protected]:58221] has failed,
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 15/07/14 09:16:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> hadoopvm5.mydomain.com): ExecutorLostFailure (executor
> 20150630-193234-1644210368-5050-10591-S3 lost)
> 15/07/14 09:16:48 WARN ReliableDeliverySupervisor: Association with remote
> system [akka.tcp://[email protected]:53763] has failed,
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 15/07/14 09:16:48 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1,
> hadoopmapr3.mydomain.com): ExecutorLostFailure (executor
> 20150630-193234-1644210368-5050-10591-S2 lost)
> 15/07/14 09:16:53 WARN ReliableDeliverySupervisor: Association with remote
> system [akka.tcp://[email protected]:52102] has failed,
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 15/07/14 09:16:53 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2,
> hadoopvm5.mydomain.com): ExecutorLostFailure (executor
> 20150630-193234-1644210368-5050-10591-S3 lost)
> 15/07/14 09:17:01 WARN ReliableDeliverySupervisor: Association with remote
> system [akka.tcp://[email protected]:58600] has failed,
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 15/07/14 09:17:01 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3,
> hadoopmapr3.mydomain.com): ExecutorLostFailure (executor
> 20150630-193234-1644210368-5050-10591-S2 lost)
> 15/07/14 09:17:01 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times;
> aborting job
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
> "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/pyspark/sql/dataframe.py",
> line 314, in collect
> port =
> self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd())
> File
> "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
> File
> "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
> (TID 3, hadoopmapr3.mydomain.com): ExecutorLostFailure (executor
> 20150630-193234-1644210368-5050-10591-S2 lost)
> Driver stacktrace:
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> Stack Trace from STDERR on Failed Mesos Task:
> {code}
> I0714 09:16:31.665690 21429 fetcher.cpp:214] Fetching URI
> '/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503.tgz'
> I0714 09:16:31.665841 21429 fetcher.cpp:194] Copying resource from
> '/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503.tgz' to
> '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f'
> I0714 09:16:35.624750 21429 fetcher.cpp:78] Extracted resource
> '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f/spark-1.4.0-bin-2.5.1-mapr-1503.tgz'
> into
> '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f'
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/07/14 09:16:39 INFO MesosExecutorBackend: Registered signal handlers for
> [TERM, HUP, INT]
> I0714 09:16:39.139713 21504 exec.cpp:132] Version: 0.22.1
> I0714 09:16:39.147428 21525 exec.cpp:206] Executor registered on slave
> 20150630-193234-1644210368-5050-10591-S3
> 15/07/14 09:16:39 INFO MesosExecutorBackend: Registered with Mesos as
> executor ID 20150630-193234-1644210368-5050-10591-S3 with 1 cpus
> 15/07/14 09:16:39 INFO SecurityManager: Changing view acls to: darkness
> 15/07/14 09:16:39 INFO SecurityManager: Changing modify acls to: darkness
> 15/07/14 09:16:39 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(darkness); users
> with modify permissions: Set(darkness)
> 15/07/14 09:16:39 INFO Slf4jLogger: Slf4jLogger started
> 15/07/14 09:16:39 INFO Remoting: Starting remoting
> 15/07/14 09:16:39 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://[email protected]:58221]
> 15/07/14 09:16:39 INFO Utils: Successfully started service 'sparkExecutor' on
> port 58221.
> 15/07/14 09:16:39 INFO DiskBlockManager: Created local directory at
> /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42/blockmgr-54a629d3-5583-4c05-b478-dfee1ad5a113
> 15/07/14 09:16:39 INFO MemoryStore: MemoryStore started with capacity 1060.0
> MB
> java.lang.NullPointerException
> at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
> at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
> at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
> at
> org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
> at
> org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
> at
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
> at
> org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
> at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
> at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
> at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
> at
> org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
> java.lang.RuntimeException: Failure loading MapRClient.
> at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:296)
> at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
> at
> org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
> at
> org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
> at
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
> at
> org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
> at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
> at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
> at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
> at
> org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
> Caused by: java.lang.NullPointerException
> at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
> at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
> ... 21 more
> java.lang.ExceptionInInitializerError
> at com.mapr.fs.ShimLoader.load(ShimLoader.java:227)
> at
> org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
> at
> org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
> at
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
> at
> org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
> at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
> at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
> at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
> at
> org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
> Caused by: java.lang.RuntimeException: Failure loading MapRClient.
> at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:296)
> at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
> ... 20 more
> Caused by: java.lang.NullPointerException
> at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
> at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
> ... 21 more
> Exception in thread "Thread-2" I0714 09:16:40.007040 21525 exec.cpp:413]
> Deactivating the executor libprocess
> 15/07/14 09:16:40 INFO DiskBlockManager: Shutdown hook called
> 15/07/14 09:16:40 INFO Utils: path =
> /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42/blockmgr-54a629d3-5583-4c05-b478-dfee1ad5a113,
> already present as root for deletion.
> 15/07/14 09:16:40 INFO Utils: Shutdown hook called
> 15/07/14 09:16:40 INFO Utils: Deleting directory
> /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]