[
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362400#comment-15362400
]
Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 12:47 PM:
----------------------------------------------------------------------
Thread dump of repl added.
waiting here
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L135
but before that:
"main" #1 prio=5 os_prio=0 tid=0x00007f0bc8011800 nid=0x1269 waiting on
condition [0x00007f0bcfa24000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000ed7b8558> (a
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$class.startScheduler(MesosSchedulerUtils.scala:135)
- locked <0x00000000ed78bb68> (a
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend)
and blocked here:
Thread-15" #67 prio=5 os_prio=0 tid=0x00007f0abc004000 nid=0x12b0 waiting for
monitor entry [0x00007f0ad37fd000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.log$lzycompute(CoarseGrainedSchedulerBackend.scala:43)
- waiting to lock <0x00000000ed78bb68> (a
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend)
MesosCoarseGrainedSchedulerBackend inherits from CoarseGrainedSchedulerBackend
so the code cannot proceed :)
was (Author: skonto):
Thread dump of repl added.
Locked here
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L135
and here:
Thread-15" #67 prio=5 os_prio=0 tid=0x00007f0abc004000 nid=0x12b0 waiting for
monitor entry [0x00007f0ad37fd000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.log$lzycompute(CoarseGrainedSchedulerBackend.scala:43)
- waiting to lock <0x00000000ed78bb68> (a
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend)
MesosCoarseGrainedSchedulerBackend inherits from CoarseGrainedSchedulerBackend
so the code cannot proceed :)
> Spark on mesos is broken due to race condition in Logging
> ---------------------------------------------------------
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.0.0
> Reporter: Stavros Kontopoulos
> Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val:
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617 9303 group.cpp:700] Trying to get
> '/mesos/json.info_0000000152' in ZooKeeper
> I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master
> ([email protected]:5050) is detected
> I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at
> [email protected]:5050
> I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided.
> Attempting to register without authentication
> I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as
> expected.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]