[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Miner updated SPARK-18343: ------------------------------- Description: I have a driver program where I write read data in from Cassandra using spark, perform some operations, and then write out to JSON on S3. The program runs fine when I use Spark 1.6.1 and the spark-cassandra-connector 1.6.0-M1. However, if I try to upgrade to Spark 2.0.1 (hadoop 2.7.1) and spark-cassandra-connector 2.0.0-M3, the program completes in the sense that all the expected files are written to S3, but the program never terminates. I do run `sc.stop()` at the end of the program. I am also using Mesos 1.0.1. In both cases I use the default output committer. >From the thread dump (included below) it seems like it could be waiting on: >`org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner` Code snippet: {code} // get MongoDB oplog operations val operations = sc.cassandraTable[JsonOperation](keyspace, namespace) .where("ts >= ? AND ts < ?", minTimestamp, maxTimestamp) // replay oplog operations into documents val documents = operations .spanBy(op => op.id) .map { case (id: String, ops: Iterable[T]) => (id, apply(ops)) } .filter { case (id, result) => result.isInstanceOf[Document] } .map { case (id, document) => MergedDocument(id = id, document = document .asInstanceOf[Document]) } // write documents to json on s3 documents .map(document => document.toJson) .coalesce(partitions) .saveAsTextFile(path, classOf[GzipCodec]) sc.stop() {code} Thread dump on the driver: {code} 60 context-cleaner-periodic-gc TIMED_WAITING 46 dag-scheduler-event-loop WAITING 4389 DestroyJavaVM RUNNABLE 12 dispatcher-event-loop-0 WAITING 13 dispatcher-event-loop-1 WAITING 14 dispatcher-event-loop-2 WAITING 15 dispatcher-event-loop-3 WAITING 47 driver-revive-thread TIMED_WAITING 3 Finalizer WAITING 82 ForkJoinPool-1-worker-17 WAITING 43 heartbeat-receiver-event-loop-thread TIMED_WAITING 93 java-sdk-http-connection-reaper TIMED_WAITING 4387 java-sdk-progress-listener-callback-thread WAITING 25 map-output-dispatcher-0 WAITING 26 map-output-dispatcher-1 WAITING 27 map-output-dispatcher-2 WAITING 28 map-output-dispatcher-3 WAITING 29 map-output-dispatcher-4 WAITING 30 map-output-dispatcher-5 WAITING 31 map-output-dispatcher-6 WAITING 32 map-output-dispatcher-7 WAITING 48 MesosCoarseGrainedSchedulerBackend-mesos-driver RUNNABLE 44 netty-rpc-env-timeout TIMED_WAITING 92 org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner WAITING 62 pool-19-thread-1 TIMED_WAITING 2 Reference Handler WAITING 61 Scheduler-1112394071 TIMED_WAITING 20 shuffle-server-0 RUNNABLE 55 shuffle-server-0 RUNNABLE 21 shuffle-server-1 RUNNABLE 56 shuffle-server-1 RUNNABLE 22 shuffle-server-2 RUNNABLE 57 shuffle-server-2 RUNNABLE 23 shuffle-server-3 RUNNABLE 58 shuffle-server-3 RUNNABLE 4 Signal Dispatcher RUNNABLE 59 Spark Context Cleaner TIMED_WAITING 9 SparkListenerBus WAITING 35 SparkUI-35-selector-ServerConnectorManager@651d3734/0 RUNNABLE 36 SparkUI-36-acceptor-0@467924cb-ServerConnector@3b5eaf92{HTTP/1.1}{0.0.0.0:4040} RUNNABLE 37 SparkUI-37-selector-ServerConnectorManager@651d3734/1 RUNNABLE 38 SparkUI-38 TIMED_WAITING 39 SparkUI-39 TIMED_WAITING 40 SparkUI-40 TIMED_WAITING 41 SparkUI-41 RUNNABLE 42 SparkUI-42 TIMED_WAITING 438 task-result-getter-0 WAITING 450 task-result-getter-1 WAITING 489 task-result-getter-2 WAITING 492 task-result-getter-3 WAITING 75 threadDeathWatcher-2-1 TIMED_WAITING 45 Timer-0 WAITING {code} Thread dump on the executors. It's the same on all of them: {code} 24 dispatcher-event-loop-0 WAITING 25 dispatcher-event-loop-1 WAITING 26 dispatcher-event-loop-2 RUNNABLE 27 dispatcher-event-loop-3 WAITING 39 driver-heartbeater TIMED_WAITING 3 Finalizer WAITING 58 java-sdk-http-connection-reaper TIMED_WAITING 75 java-sdk-progress-listener-callback-thread WAITING 1 main TIMED_WAITING 33 netty-rpc-env-timeout TIMED_WAITING 55 org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner WAITING 59 pool-17-thread-1 TIMED_WAITING 2 Reference Handler WAITING 28 shuffle-client-0 RUNNABLE 35 shuffle-client-0 RUNNABLE 41 shuffle-client-0 RUNNABLE 37 shuffle-server-0 RUNNABLE 5 Signal Dispatcher RUNNABLE 23 threadDeathWatcher-2-1 TIMED_WAITING {code} Jstack of an executor: {code} ubuntu@ip-10-0-230-88:~$ sudo jstack 21811 2016-11-08 21:38:02 Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode): "Attach Listener" daemon prio=10 tid=0x00007f8234003800 nid=0x5a4c waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "java-sdk-progress-listener-callback-thread" daemon prio=10 tid=0x00007f8218001000 nid=0x55c5 waiting on condition [0x00007f81e98d5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000078797f4f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "pool-17-thread-1" daemon prio=10 tid=0x00007f82141f9000 nid=0x5597 waiting on condition [0x00007f81fc2bb000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000074d9008e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "java-sdk-http-connection-reaper" daemon prio=10 tid=0x00007f820837e000 nid=0x5596 waiting on condition [0x00007f81fc3bc000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.amazonaws.http.IdleConnectionReaper.run(IdleConnectionReaper.java:112) "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" daemon prio=10 tid=0x00007f8208352800 nid=0x5594 in Object.wait() [0x00007f824cc13000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0000000756803100> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) - locked <0x0000000756803100> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3063) at java.lang.Thread.run(Thread.java:745) "shuffle-client-0" daemon prio=10 tid=0x00007f8208110800 nid=0x5593 runnable [0x00007f824ca11000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x0000000756803238> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x0000000756803258> (a java.util.Collections$UnmodifiableSet) - locked <0x00000007568031f0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) "shuffle-client-0" daemon prio=10 tid=0x00007f820803b800 nid=0x5578 runnable [0x00007f824c704000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x00000007568033e0> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x0000000756803400> (a java.util.Collections$UnmodifiableSet) - locked <0x0000000756803398> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) "driver-heartbeater" daemon prio=10 tid=0x00007f8200047800 nid=0x5573 waiting on condition [0x00007f81fdefb000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007568036b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "shuffle-server-0" daemon prio=10 tid=0x00007f8200044000 nid=0x5572 runnable [0x00007f81fdffc000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x00000007568038c0> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x00000007568038e0> (a java.util.Collections$UnmodifiableSet) - locked <0x0000000756803878> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) "shuffle-client-0" daemon prio=10 tid=0x000000000222c000 nid=0x5571 runnable [0x00007f824c1ff000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x0000000756803a68> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x0000000756803a88> (a java.util.Collections$UnmodifiableSet) - locked <0x0000000756803a20> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) "netty-rpc-env-timeout" daemon prio=10 tid=0x00007f8285248000 nid=0x5570 waiting on condition [0x00007f824c300000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000756803b80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "dispatcher-event-loop-3" daemon prio=10 tid=0x00007f82851f4800 nid=0x556e waiting on condition [0x00007f824c502000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000756802418> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:207) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "dispatcher-event-loop-2" daemon prio=10 tid=0x00007f82851f3800 nid=0x556d waiting on condition [0x00007f824c805000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000756802418> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:207) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "dispatcher-event-loop-1" daemon prio=10 tid=0x00007f82851f3000 nid=0x556c waiting on condition [0x00007f824cf15000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000756802418> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:207) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "dispatcher-event-loop-0" daemon prio=10 tid=0x00007f82851f2000 nid=0x556b waiting on condition [0x00007f824c906000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000756802418> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:207) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "threadDeathWatcher-2-1" daemon prio=10 tid=0x00007f820400e000 nid=0x5567 waiting on condition [0x00007f824c603000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:137) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) "Service Thread" daemon prio=10 tid=0x00007f82842ae000 nid=0x555a runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread1" daemon prio=10 tid=0x00007f82842ab000 nid=0x5559 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" daemon prio=10 tid=0x00007f82842a9000 nid=0x5558 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00007f82842a6800 nid=0x5557 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Surrogate Locker Thread (Concurrent GC)" daemon prio=10 tid=0x00007f82842a4800 nid=0x5556 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x00007f8284282800 nid=0x5555 in Object.wait() [0x00007f8280dfc000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007568040a0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) - locked <0x00000007568040a0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "Reference Handler" daemon prio=10 tid=0x00007f8284280800 nid=0x5554 in Object.wait() [0x00007f8280efd000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007568040e0> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:503) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked <0x00000007568040e0> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x00007f8284021000 nid=0x5547 waiting on condition [0x00007f828da05000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000756804ac8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468) at org.apache.spark.rpc.netty.Dispatcher.awaitTermination(Dispatcher.scala:180) at org.apache.spark.rpc.netty.NettyRpcEnv.awaitTermination(NettyRpcEnv.scala:273) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:217) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:174) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:270) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) "VM Thread" prio=10 tid=0x00007f828427c000 nid=0x5553 runnable "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f8284035800 nid=0x5548 runnable "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f8284037800 nid=0x5549 runnable "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f8284039000 nid=0x554a runnable "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f828403b000 nid=0x554b runnable "G1 Main Concurrent Mark GC Thread" prio=10 tid=0x00007f828404f800 nid=0x5551 runnable "Gang worker#0 (G1 Parallel Marking Threads)" prio=10 tid=0x00007f8284062000 nid=0x5552 runnable "G1 Concurrent Refinement Thread#0" prio=10 tid=0x00007f8284045800 nid=0x5550 runnable "G1 Concurrent Refinement Thread#1" prio=10 tid=0x00007f8284043800 nid=0x554f runnable "G1 Concurrent Refinement Thread#2" prio=10 tid=0x00007f8284041800 nid=0x554e runnable "G1 Concurrent Refinement Thread#3" prio=10 tid=0x00007f828403f800 nid=0x554d runnable "G1 Concurrent Refinement Thread#4" prio=10 tid=0x00007f828403e000 nid=0x554c runnable "VM Periodic Task Thread" prio=10 tid=0x00007f82842b8800 nid=0x555b waiting on condition JNI global references: 358 {code} was: I have a driver program where I write read data in from Cassandra using spark, perform some operations, and then write out to JSON on S3. The program runs fine when I use Spark 1.6.1 and the spark-cassandra-connector 1.6.0-M1. However, if I try to upgrade to Spark 2.0.1 (hadoop 2.7.1) and spark-cassandra-connector 2.0.0-M3, the program completes in the sense that all the expected files are written to S3, but the program never terminates. I do run `sc.stop()` at the end of the program. I am also using Mesos 1.0.1. In both cases I use the default output committer. >From the thread dump (included below) it seems like it could be waiting on: >`org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner` Code snippet: {code} // get MongoDB oplog operations val operations = sc.cassandraTable[JsonOperation](keyspace, namespace) .where("ts >= ? AND ts < ?", minTimestamp, maxTimestamp) // replay oplog operations into documents val documents = operations .spanBy(op => op.id) .map { case (id: String, ops: Iterable[T]) => (id, apply(ops)) } .filter { case (id, result) => result.isInstanceOf[Document] } .map { case (id, document) => MergedDocument(id = id, document = document .asInstanceOf[Document]) } // write documents to json on s3 documents .map(document => document.toJson) .coalesce(partitions) .saveAsTextFile(path, classOf[GzipCodec]) sc.stop() {code} Thread dump on the driver: {code} 60 context-cleaner-periodic-gc TIMED_WAITING 46 dag-scheduler-event-loop WAITING 4389 DestroyJavaVM RUNNABLE 12 dispatcher-event-loop-0 WAITING 13 dispatcher-event-loop-1 WAITING 14 dispatcher-event-loop-2 WAITING 15 dispatcher-event-loop-3 WAITING 47 driver-revive-thread TIMED_WAITING 3 Finalizer WAITING 82 ForkJoinPool-1-worker-17 WAITING 43 heartbeat-receiver-event-loop-thread TIMED_WAITING 93 java-sdk-http-connection-reaper TIMED_WAITING 4387 java-sdk-progress-listener-callback-thread WAITING 25 map-output-dispatcher-0 WAITING 26 map-output-dispatcher-1 WAITING 27 map-output-dispatcher-2 WAITING 28 map-output-dispatcher-3 WAITING 29 map-output-dispatcher-4 WAITING 30 map-output-dispatcher-5 WAITING 31 map-output-dispatcher-6 WAITING 32 map-output-dispatcher-7 WAITING 48 MesosCoarseGrainedSchedulerBackend-mesos-driver RUNNABLE 44 netty-rpc-env-timeout TIMED_WAITING 92 org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner WAITING 62 pool-19-thread-1 TIMED_WAITING 2 Reference Handler WAITING 61 Scheduler-1112394071 TIMED_WAITING 20 shuffle-server-0 RUNNABLE 55 shuffle-server-0 RUNNABLE 21 shuffle-server-1 RUNNABLE 56 shuffle-server-1 RUNNABLE 22 shuffle-server-2 RUNNABLE 57 shuffle-server-2 RUNNABLE 23 shuffle-server-3 RUNNABLE 58 shuffle-server-3 RUNNABLE 4 Signal Dispatcher RUNNABLE 59 Spark Context Cleaner TIMED_WAITING 9 SparkListenerBus WAITING 35 SparkUI-35-selector-ServerConnectorManager@651d3734/0 RUNNABLE 36 SparkUI-36-acceptor-0@467924cb-ServerConnector@3b5eaf92{HTTP/1.1}{0.0.0.0:4040} RUNNABLE 37 SparkUI-37-selector-ServerConnectorManager@651d3734/1 RUNNABLE 38 SparkUI-38 TIMED_WAITING 39 SparkUI-39 TIMED_WAITING 40 SparkUI-40 TIMED_WAITING 41 SparkUI-41 RUNNABLE 42 SparkUI-42 TIMED_WAITING 438 task-result-getter-0 WAITING 450 task-result-getter-1 WAITING 489 task-result-getter-2 WAITING 492 task-result-getter-3 WAITING 75 threadDeathWatcher-2-1 TIMED_WAITING 45 Timer-0 WAITING {code} Thread dump on the executors. It's the same on all of them: {code} 24 dispatcher-event-loop-0 WAITING 25 dispatcher-event-loop-1 WAITING 26 dispatcher-event-loop-2 RUNNABLE 27 dispatcher-event-loop-3 WAITING 39 driver-heartbeater TIMED_WAITING 3 Finalizer WAITING 58 java-sdk-http-connection-reaper TIMED_WAITING 75 java-sdk-progress-listener-callback-thread WAITING 1 main TIMED_WAITING 33 netty-rpc-env-timeout TIMED_WAITING 55 org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner WAITING 59 pool-17-thread-1 TIMED_WAITING 2 Reference Handler WAITING 28 shuffle-client-0 RUNNABLE 35 shuffle-client-0 RUNNABLE 41 shuffle-client-0 RUNNABLE 37 shuffle-server-0 RUNNABLE 5 Signal Dispatcher RUNNABLE 23 threadDeathWatcher-2-1 TIMED_WAITING {code} > FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write > ---------------------------------------------------------------------- > > Key: SPARK-18343 > URL: https://issues.apache.org/jira/browse/SPARK-18343 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.0.1 > Environment: Spark 2.0.1 > Hadoop 2.7.1 > Mesos 1.0.1 > Ubuntu 14.04 > Reporter: Luke Miner > > I have a driver program where I write read data in from Cassandra using > spark, perform some operations, and then write out to JSON on S3. The program > runs fine when I use Spark 1.6.1 and the spark-cassandra-connector 1.6.0-M1. > However, if I try to upgrade to Spark 2.0.1 (hadoop 2.7.1) and > spark-cassandra-connector 2.0.0-M3, the program completes in the sense that > all the expected files are written to S3, but the program never terminates. > I do run `sc.stop()` at the end of the program. I am also using Mesos 1.0.1. > In both cases I use the default output committer. > From the thread dump (included below) it seems like it could be waiting on: > `org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner` > Code snippet: > {code} > // get MongoDB oplog operations > val operations = sc.cassandraTable[JsonOperation](keyspace, namespace) > .where("ts >= ? AND ts < ?", minTimestamp, maxTimestamp) > > // replay oplog operations into documents > val documents = operations > .spanBy(op => op.id) > .map { case (id: String, ops: Iterable[T]) => (id, apply(ops)) } > .filter { case (id, result) => result.isInstanceOf[Document] } > .map { case (id, document) => MergedDocument(id = id, document = > document > .asInstanceOf[Document]) > } > > // write documents to json on s3 > documents > .map(document => document.toJson) > .coalesce(partitions) > .saveAsTextFile(path, classOf[GzipCodec]) > sc.stop() > {code} > Thread dump on the driver: > {code} > 60 context-cleaner-periodic-gc TIMED_WAITING > 46 dag-scheduler-event-loop WAITING > 4389 DestroyJavaVM RUNNABLE > 12 dispatcher-event-loop-0 WAITING > 13 dispatcher-event-loop-1 WAITING > 14 dispatcher-event-loop-2 WAITING > 15 dispatcher-event-loop-3 WAITING > 47 driver-revive-thread TIMED_WAITING > 3 Finalizer WAITING > 82 ForkJoinPool-1-worker-17 WAITING > 43 heartbeat-receiver-event-loop-thread TIMED_WAITING > 93 java-sdk-http-connection-reaper TIMED_WAITING > 4387 java-sdk-progress-listener-callback-thread WAITING > 25 map-output-dispatcher-0 WAITING > 26 map-output-dispatcher-1 WAITING > 27 map-output-dispatcher-2 WAITING > 28 map-output-dispatcher-3 WAITING > 29 map-output-dispatcher-4 WAITING > 30 map-output-dispatcher-5 WAITING > 31 map-output-dispatcher-6 WAITING > 32 map-output-dispatcher-7 WAITING > 48 MesosCoarseGrainedSchedulerBackend-mesos-driver RUNNABLE > 44 netty-rpc-env-timeout TIMED_WAITING > 92 > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner > WAITING > 62 pool-19-thread-1 TIMED_WAITING > 2 Reference Handler WAITING > 61 Scheduler-1112394071 TIMED_WAITING > 20 shuffle-server-0 RUNNABLE > 55 shuffle-server-0 RUNNABLE > 21 shuffle-server-1 RUNNABLE > 56 shuffle-server-1 RUNNABLE > 22 shuffle-server-2 RUNNABLE > 57 shuffle-server-2 RUNNABLE > 23 shuffle-server-3 RUNNABLE > 58 shuffle-server-3 RUNNABLE > 4 Signal Dispatcher RUNNABLE > 59 Spark Context Cleaner TIMED_WAITING > 9 SparkListenerBus WAITING > 35 SparkUI-35-selector-ServerConnectorManager@651d3734/0 RUNNABLE > 36 > SparkUI-36-acceptor-0@467924cb-ServerConnector@3b5eaf92{HTTP/1.1}{0.0.0.0:4040} > RUNNABLE > 37 SparkUI-37-selector-ServerConnectorManager@651d3734/1 RUNNABLE > 38 SparkUI-38 TIMED_WAITING > 39 SparkUI-39 TIMED_WAITING > 40 SparkUI-40 TIMED_WAITING > 41 SparkUI-41 RUNNABLE > 42 SparkUI-42 TIMED_WAITING > 438 task-result-getter-0 WAITING > 450 task-result-getter-1 WAITING > 489 task-result-getter-2 WAITING > 492 task-result-getter-3 WAITING > 75 threadDeathWatcher-2-1 TIMED_WAITING > 45 Timer-0 WAITING > {code} > Thread dump on the executors. It's the same on all of them: > {code} > 24 dispatcher-event-loop-0 WAITING > 25 dispatcher-event-loop-1 WAITING > 26 dispatcher-event-loop-2 RUNNABLE > 27 dispatcher-event-loop-3 WAITING > 39 driver-heartbeater TIMED_WAITING > 3 Finalizer WAITING > 58 java-sdk-http-connection-reaper TIMED_WAITING > 75 java-sdk-progress-listener-callback-thread WAITING > 1 main TIMED_WAITING > 33 netty-rpc-env-timeout TIMED_WAITING > 55 > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner > WAITING > 59 pool-17-thread-1 TIMED_WAITING > 2 Reference Handler WAITING > 28 shuffle-client-0 RUNNABLE > 35 shuffle-client-0 RUNNABLE > 41 shuffle-client-0 RUNNABLE > 37 shuffle-server-0 RUNNABLE > 5 Signal Dispatcher RUNNABLE > 23 threadDeathWatcher-2-1 TIMED_WAITING > {code} > Jstack of an executor: > {code} > ubuntu@ip-10-0-230-88:~$ sudo jstack 21811 > 2016-11-08 21:38:02 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode): > "Attach Listener" daemon prio=10 tid=0x00007f8234003800 nid=0x5a4c waiting on > condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "java-sdk-progress-listener-callback-thread" daemon prio=10 > tid=0x00007f8218001000 nid=0x55c5 waiting on condition [0x00007f81e98d5000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000000078797f4f8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "pool-17-thread-1" daemon prio=10 tid=0x00007f82141f9000 nid=0x5597 waiting > on condition [0x00007f81fc2bb000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000000074d9008e8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "java-sdk-http-connection-reaper" daemon prio=10 tid=0x00007f820837e000 > nid=0x5596 waiting on condition [0x00007f81fc3bc000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > com.amazonaws.http.IdleConnectionReaper.run(IdleConnectionReaper.java:112) > "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" > daemon prio=10 tid=0x00007f8208352800 nid=0x5594 in Object.wait() > [0x00007f824cc13000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0000000756803100> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) > - locked <0x0000000756803100> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) > at > org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3063) > at java.lang.Thread.run(Thread.java:745) > "shuffle-client-0" daemon prio=10 tid=0x00007f8208110800 nid=0x5593 runnable > [0x00007f824ca11000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) > - locked <0x0000000756803238> (a > io.netty.channel.nio.SelectedSelectionKeySet) > - locked <0x0000000756803258> (a java.util.Collections$UnmodifiableSet) > - locked <0x00000007568031f0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) > at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > "shuffle-client-0" daemon prio=10 tid=0x00007f820803b800 nid=0x5578 runnable > [0x00007f824c704000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) > - locked <0x00000007568033e0> (a > io.netty.channel.nio.SelectedSelectionKeySet) > - locked <0x0000000756803400> (a java.util.Collections$UnmodifiableSet) > - locked <0x0000000756803398> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) > at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > "driver-heartbeater" daemon prio=10 tid=0x00007f8200047800 nid=0x5573 waiting > on condition [0x00007f81fdefb000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000007568036b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "shuffle-server-0" daemon prio=10 tid=0x00007f8200044000 nid=0x5572 runnable > [0x00007f81fdffc000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) > - locked <0x00000007568038c0> (a > io.netty.channel.nio.SelectedSelectionKeySet) > - locked <0x00000007568038e0> (a java.util.Collections$UnmodifiableSet) > - locked <0x0000000756803878> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) > at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > "shuffle-client-0" daemon prio=10 tid=0x000000000222c000 nid=0x5571 runnable > [0x00007f824c1ff000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) > - locked <0x0000000756803a68> (a > io.netty.channel.nio.SelectedSelectionKeySet) > - locked <0x0000000756803a88> (a java.util.Collections$UnmodifiableSet) > - locked <0x0000000756803a20> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) > at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:622) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:310) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > "netty-rpc-env-timeout" daemon prio=10 tid=0x00007f8285248000 nid=0x5570 > waiting on condition [0x00007f824c300000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000756803b80> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) > at > java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "dispatcher-event-loop-3" daemon prio=10 tid=0x00007f82851f4800 nid=0x556e > waiting on condition [0x00007f824c502000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000756802418> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:207) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "dispatcher-event-loop-2" daemon prio=10 tid=0x00007f82851f3800 nid=0x556d > waiting on condition [0x00007f824c805000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000756802418> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:207) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "dispatcher-event-loop-1" daemon prio=10 tid=0x00007f82851f3000 nid=0x556c > waiting on condition [0x00007f824cf15000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000756802418> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:207) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "dispatcher-event-loop-0" daemon prio=10 tid=0x00007f82851f2000 nid=0x556b > waiting on condition [0x00007f824c906000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000756802418> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:207) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "threadDeathWatcher-2-1" daemon prio=10 tid=0x00007f820400e000 nid=0x5567 > waiting on condition [0x00007f824c603000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:137) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > at java.lang.Thread.run(Thread.java:745) > "Service Thread" daemon prio=10 tid=0x00007f82842ae000 nid=0x555a runnable > [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "C2 CompilerThread1" daemon prio=10 tid=0x00007f82842ab000 nid=0x5559 waiting > on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "C2 CompilerThread0" daemon prio=10 tid=0x00007f82842a9000 nid=0x5558 waiting > on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "Signal Dispatcher" daemon prio=10 tid=0x00007f82842a6800 nid=0x5557 runnable > [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "Surrogate Locker Thread (Concurrent GC)" daemon prio=10 > tid=0x00007f82842a4800 nid=0x5556 waiting on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > "Finalizer" daemon prio=10 tid=0x00007f8284282800 nid=0x5555 in Object.wait() > [0x00007f8280dfc000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000007568040a0> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) > - locked <0x00000007568040a0> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) > "Reference Handler" daemon prio=10 tid=0x00007f8284280800 nid=0x5554 in > Object.wait() [0x00007f8280efd000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000007568040e0> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:503) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) > - locked <0x00000007568040e0> (a java.lang.ref.Reference$Lock) > "main" prio=10 tid=0x00007f8284021000 nid=0x5547 waiting on condition > [0x00007f828da05000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000756804ac8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at > java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468) > at > org.apache.spark.rpc.netty.Dispatcher.awaitTermination(Dispatcher.scala:180) > at > org.apache.spark.rpc.netty.NettyRpcEnv.awaitTermination(NettyRpcEnv.scala:273) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:217) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:174) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:270) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > "VM Thread" prio=10 tid=0x00007f828427c000 nid=0x5553 runnable > "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f8284035800 > nid=0x5548 runnable > "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f8284037800 > nid=0x5549 runnable > "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f8284039000 > nid=0x554a runnable > "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f828403b000 > nid=0x554b runnable > "G1 Main Concurrent Mark GC Thread" prio=10 tid=0x00007f828404f800 nid=0x5551 > runnable > "Gang worker#0 (G1 Parallel Marking Threads)" prio=10 tid=0x00007f8284062000 > nid=0x5552 runnable > "G1 Concurrent Refinement Thread#0" prio=10 tid=0x00007f8284045800 nid=0x5550 > runnable > "G1 Concurrent Refinement Thread#1" prio=10 tid=0x00007f8284043800 nid=0x554f > runnable > "G1 Concurrent Refinement Thread#2" prio=10 tid=0x00007f8284041800 nid=0x554e > runnable > "G1 Concurrent Refinement Thread#3" prio=10 tid=0x00007f828403f800 nid=0x554d > runnable > "G1 Concurrent Refinement Thread#4" prio=10 tid=0x00007f828403e000 nid=0x554c > runnable > "VM Periodic Task Thread" prio=10 tid=0x00007f82842b8800 nid=0x555b waiting > on condition > JNI global references: 358 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org