keith-turner opened a new issue #1069: Saw deadlock in fluo map reduce load job URL: https://github.com/apache/fluo/issues/1069 While running the stress test I saw map reduce jobs hang when trying to close. Jstacking a map reduce process I saw the following deadlock. ``` "main" #1 prio=5 os_prio=0 tid=0x00007fd42c016800 nid=0x5774 waiting on condition [0x00007fd434d87000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000ef5abf08> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.fluo.core.impl.SharedBatchWriter.close(SharedBatchWriter.java:192) at org.apache.fluo.core.impl.SharedResources.close(SharedResources.java:211) - locked <0x00000000ef008a40> (a org.apache.fluo.core.impl.SharedResources) at org.apache.fluo.core.impl.Environment.close(Environment.java:254) at org.apache.fluo.core.client.FluoClientImpl.close(FluoClientImpl.java:116) at org.apache.fluo.mapreduce.FluoOutputFormat$2.close(FluoOutputFormat.java:96) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:682) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:805) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) ``` ``` "Fluo-0001-001-sharedBW" #53 daemon prio=5 os_prio=0 tid=0x00007fd42d935800 nid=0x59ef waiting for monitor entry [0x00007fd411154000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.fluo.core.impl.SharedResources.getTimestampTracker(SharedResources.java:138) - waiting to lock <0x00000000ef008a40> (a org.apache.fluo.core.impl.SharedResources) at org.apache.fluo.core.impl.TransactionImpl.close(TransactionImpl.java:771) - locked <0x00000000eff41da0> (a org.apache.fluo.core.impl.TransactionImpl) at org.apache.fluo.core.impl.TransactionImpl.close(TransactionImpl.java:777) at org.apache.fluo.core.async.CommitManager$CQCommitObserver.finish(CommitManager.java:68) at org.apache.fluo.core.async.CommitManager$CQCommitObserver.committed(CommitManager.java:87) at org.apache.fluo.core.impl.TransactionImpl$FinishCommitStep.lambda$getMainOp$0(TransactionImpl.java:1377) at org.apache.fluo.core.impl.TransactionImpl$FinishCommitStep$$Lambda$88/441397001.apply(Unknown Source) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at org.apache.fluo.core.impl.SharedBatchWriter$MutationBatch.countDown(SharedBatchWriter.java:74) at org.apache.fluo.core.impl.SharedBatchWriter$FlushTask.processBatches(SharedBatchWriter.java:123) at org.apache.fluo.core.impl.SharedBatchWriter$FlushTask.run(SharedBatchWriter.java:93) at java.lang.Thread.run(Thread.java:748) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
