[ https://issues.apache.org/jira/browse/RATIS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953937#comment-16953937 ]
Hadoop QA commented on RATIS-706: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 55s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 5s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 2s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | ratis.logservice.TestLogServiceWithNetty | | | ratis.logservice.TestLogServiceWithGrpc | | | ratis.logservice.server.TestMetaServer | | | ratis.grpc.TestServerRestartWithGrpc | | | ratis.grpc.TestRaftReconfigurationWithGrpc | | | ratis.server.simulation.TestRaftReconfigurationWithSimulatedRpc | | | ratis.netty.TestRaftReconfigurationWithNetty | | | ratis.server.raftlog.TestRaftLogMetrics | | | ratis.grpc.TestRaftAsyncWithGrpc | | | ratis.grpc.TestWatchRequestWithGrpc | | | ratis.examples.filestore.TestFileStoreWithGrpc | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/ratis:date2019-10-17 | | JIRA Issue | RATIS-706 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982735/r706_20191011b.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs checkstyle compile | | uname | Linux 51181d5e7d30 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh | | git revision | master / 9cbf1ef | | maven | version: Apache Maven 3.6.2 (40f52333136460af0dc0d7232c0dc0bcf0d9e117; 2019-08-27T15:06:16Z) | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/PreCommit-RATIS-Build/1075/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-RATIS-Build/1075/testReport/ | | Max. process+thread count | 3368 (vs. ulimit of 5000) | | modules | C: ratis-common ratis-test U: . | | Console output | https://builds.apache.org/job/PreCommit-RATIS-Build/1075/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Dead lock in GrpcClientRpc > -------------------------- > > Key: RATIS-706 > URL: https://issues.apache.org/jira/browse/RATIS-706 > Project: Ratis > Issue Type: Bug > Components: gRPC > Reporter: Marton Elek > Assignee: Tsz-wo Sze > Priority: Major > Attachments: jstack.txt, r706_20191011.patch, r706_20191011b.patch > > > I started an Ozone cluster on Kubernetes and started a freon test (ozone > freon ockg -n10000) > After a while I found that the one freon instance is not creating keys any > more. I checked the om RPC endpoint with ozone insight and no RPC messages > has been arrived. > Based on the jstack output we have a deadlock between > PeerProxyMap.handleException and GrpcClientRpc.sendRequestAsync. > I am not sure (yet) what is the exact problem, but based on the stack traces > It seems to be Ratis related. > {code} > Found one Java-level deadlock: > ============================= > "pool-2-thread-6": > waiting to lock monitor 0x00007f80356c8800 (object 0x000000033eb70a00, a > java.lang.Object), > which is held by > "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty > queue]" > "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty > queue]": > waiting to lock monitor 0x0000000001170980 (object 0x000000033eb99b10, a > org.apache.ratis.util.SlidingWindow$Client), > which is held by > "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty > queue]" > "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty > queue]": > waiting to lock monitor 0x00007f80356c8800 (object 0x000000033eb70a00, a > java.lang.Object), > which is held by > "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty > queue]" > Java stack information for the threads listed above: > =================================================== > "pool-2-thread-6": > at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:103) > - waiting to lock <0x000000033eb70a00> (a java.lang.Object) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsyncUnordered(GrpcClientRpc.java:78) > at > org.apache.ratis.client.impl.UnorderedAsync.sendRequestWithRetry(UnorderedAsync.java:75) > at > org.apache.ratis.client.impl.UnorderedAsync.send(UnorderedAsync.java:59) > at > org.apache.ratis.client.impl.RaftClientImpl.sendWatchAsync(RaftClientImpl.java:139) > at > org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:282) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:198) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:161) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:346) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:482) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:435) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:473) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60) > - locked <0x00000003f2ba4240> (a > org.apache.hadoop.ozone.client.io.OzoneOutputStream) > at > org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710) > at > org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88) > at > org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615) > at > java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515) > at > java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.3/Thread.java:834) > "java.util.concurrent.ThreadPoolExecutor$Worker@77329f41[State = -1, empty > queue]": > at > org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348) > - waiting to lock <0x000000033eb99b10> (a > org.apache.ratis.util.SlidingWindow$Client) > at > org.apache.ratis.client.impl.OrderedAsync.resetSlidingWindow(OrderedAsync.java:121) > at > org.apache.ratis.client.impl.OrderedAsync$$Lambda$496/0x0000000840498440.accept(Unknown > Source) > at > org.apache.ratis.client.impl.RaftClientImpl.lambda$handleIOException$6(RaftClientImpl.java:349) > at > org.apache.ratis.client.impl.RaftClientImpl$$Lambda$427/0x00000008402e1840.accept(Unknown > Source) > at java.util.Optional.ifPresent(java.base@11.0.3/Optional.java:183) > at > org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:349) > at > org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236) > at > org.apache.ratis.client.impl.OrderedAsync$$Lambda$358/0x00000008402d7040.apply(Unknown > Source) > at > java.util.concurrent.CompletableFuture.uniExceptionally(java.base@11.0.3/CompletableFuture.java:986) > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(java.base@11.0.3/CompletableFuture.java:970) > at > java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.3/CompletableFuture.java:506) > at > java.util.concurrent.CompletableFuture.completeExceptionally(java.base@11.0.3/CompletableFuture.java:2088) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:345) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.close(GrpcClientProtocolClient.java:334) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$400(GrpcClientProtocolClient.java:261) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.lambda$close$1(GrpcClientProtocolClient.java:141) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$$Lambda$435/0x00000008402d1c40.accept(Unknown > Source) > at java.util.Optional.ifPresent(java.base@11.0.3/Optional.java:183) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:141) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy$$Lambda$430/0x00000008402e1440.run(Unknown > Source) > at > org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231) > at > org.apache.ratis.util.LifeCycle$$Lambda$433/0x00000008402d1040.get(Unknown > Source) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251) > at > org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229) > at > org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70) > - locked <0x000000008c1ba0e8> (a > org.apache.ratis.util.PeerProxyMap$PeerAndProxy) > at org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127) > - locked <0x000000033eb70a00> (a java.lang.Object) > at > org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136) > at > org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47) > at > org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372) > at > org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236) > at > org.apache.ratis.client.impl.OrderedAsync$$Lambda$358/0x00000008402d7040.apply(Unknown > Source) > at > java.util.concurrent.CompletableFuture.uniExceptionally(java.base@11.0.3/CompletableFuture.java:986) > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(java.base@11.0.3/CompletableFuture.java:970) > at > java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.3/CompletableFuture.java:506) > at > java.util.concurrent.CompletableFuture.completeExceptionally(java.base@11.0.3/CompletableFuture.java:2088) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$$Lambda$490/0x000000084049f040.accept(Unknown > Source) > at java.util.Optional.ifPresent(java.base@11.0.3/Optional.java:183) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$$Lambda$336/0x00000008403b2040.run(Unknown > Source) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113) > at > org.apache.ratis.util.TimeoutScheduler$$Lambda$345/0x00000008403ac440.accept(Unknown > Source) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133) > at > org.apache.ratis.util.TimeoutScheduler$$Lambda$349/0x00000008403a1440.run(Unknown > Source) > at > org.apache.ratis.util.LogUtils$1$$Lambda$448/0x00000008402d6840.run(Unknown > Source) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at > java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515) > at > java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.3/ScheduledThreadPoolExecutor.java:304) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.3/Thread.java:834) > "java.util.concurrent.ThreadPoolExecutor$Worker@df368f8[State = -1, empty > queue]": > at org.apache.ratis.util.PeerProxyMap.getProxy(PeerProxyMap.java:103) > - waiting to lock <0x000000033eb70a00> (a java.lang.Object) > at > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:66) > at > org.apache.ratis.client.impl.OrderedAsync.sendRequest(OrderedAsync.java:208) > at > org.apache.ratis.client.impl.OrderedAsync.sendRequestWithRetry(OrderedAsync.java:169) > at > org.apache.ratis.client.impl.OrderedAsync$$Lambda$449/0x00000008402d6c40.accept(Unknown > Source) > at > org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:278) > at > org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:294) > - locked <0x000000033eb99b10> (a > org.apache.ratis.util.SlidingWindow$Client) > at > org.apache.ratis.client.impl.OrderedAsync.lambda$scheduleWithTimeout$7(OrderedAsync.java:195) > at > org.apache.ratis.client.impl.OrderedAsync$$Lambda$444/0x00000008402d4840.run(Unknown > Source) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113) > at > org.apache.ratis.util.TimeoutScheduler$$Lambda$345/0x00000008403ac440.accept(Unknown > Source) > at > org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133) > at > org.apache.ratis.util.TimeoutScheduler$$Lambda$349/0x00000008403a1440.run(Unknown > Source) > at > org.apache.ratis.util.LogUtils$1$$Lambda$448/0x00000008402d6840.run(Unknown > Source) > at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) > at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) > at > java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.3/Executors.java:515) > at > java.util.concurrent.FutureTask.run(java.base@11.0.3/FutureTask.java:264) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.3/ScheduledThreadPoolExecutor.java:304) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.3/Thread.java:834) > Found 1 deadlock. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)