keith-turner opened a new issue #1069: Saw deadlock in fluo map reduce load job
URL: https://github.com/apache/fluo/issues/1069
 
 
   While running the stress test I saw map reduce jobs hang when trying to 
close.  Jstacking a map reduce process I saw the following deadlock.
   
   ```
   "main" #1 prio=5 os_prio=0 tid=0x00007fd42c016800 nid=0x5774 waiting on 
condition [0x00007fd434d87000]
      java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000ef5abf08> (a 
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at 
org.apache.fluo.core.impl.SharedBatchWriter.close(SharedBatchWriter.java:192)
        at 
org.apache.fluo.core.impl.SharedResources.close(SharedResources.java:211)
        - locked <0x00000000ef008a40> (a 
org.apache.fluo.core.impl.SharedResources)
        at org.apache.fluo.core.impl.Environment.close(Environment.java:254)
        at 
org.apache.fluo.core.client.FluoClientImpl.close(FluoClientImpl.java:116)
        at 
org.apache.fluo.mapreduce.FluoOutputFormat$2.close(FluoOutputFormat.java:96)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:682)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:805)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
   ```
   
   ```
   "Fluo-0001-001-sharedBW" #53 daemon prio=5 os_prio=0 tid=0x00007fd42d935800 
nid=0x59ef waiting for monitor entry [0x00007fd411154000]
      java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.fluo.core.impl.SharedResources.getTimestampTracker(SharedResources.java:138)
        - waiting to lock <0x00000000ef008a40> (a 
org.apache.fluo.core.impl.SharedResources)
        at 
org.apache.fluo.core.impl.TransactionImpl.close(TransactionImpl.java:771)
        - locked <0x00000000eff41da0> (a 
org.apache.fluo.core.impl.TransactionImpl)
        at 
org.apache.fluo.core.impl.TransactionImpl.close(TransactionImpl.java:777)
        at 
org.apache.fluo.core.async.CommitManager$CQCommitObserver.finish(CommitManager.java:68)
        at 
org.apache.fluo.core.async.CommitManager$CQCommitObserver.committed(CommitManager.java:87)
        at 
org.apache.fluo.core.impl.TransactionImpl$FinishCommitStep.lambda$getMainOp$0(TransactionImpl.java:1377)
        at 
org.apache.fluo.core.impl.TransactionImpl$FinishCommitStep$$Lambda$88/441397001.apply(Unknown
 Source)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
        at 
org.apache.fluo.core.impl.SharedBatchWriter$MutationBatch.countDown(SharedBatchWriter.java:74)
        at 
org.apache.fluo.core.impl.SharedBatchWriter$FlushTask.processBatches(SharedBatchWriter.java:123)
        at 
org.apache.fluo.core.impl.SharedBatchWriter$FlushTask.run(SharedBatchWriter.java:93)
        at java.lang.Thread.run(Thread.java:748)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to