[ 
https://issues.apache.org/jira/browse/HDDS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663946#comment-16663946
 ] 

Mukul Kumar Singh commented on HDDS-728:
----------------------------------------

Thanks for the review [~shashikant] and [~nandakumar131]. patch v2 addresses 
the review comments. Please have a look at the latest patch.

1) In XceiverServerRatis we don't need to maintain stateMachineMap, 
RaftServerProxy already has a map to maintain this and the entry from that map 
is removed whenever we do group remove.
bq. done

2) In MiniOzoneClusterImpl, do we need this change? We can always wait for the 
datanode to get ready whenever we do a datanode restart.
bq. the changes are there to test for cases when SCM marks the datanode stale, 
wanted to test for case where SCM doesn't gets time to chek for stale.

3) I think its better to have the executor service array in 
containerStateMachine itself and shut it down during close. Since, we are now 
passing an array reference over containerStateMachine constructor, it may give 
a findbug warning as well.
bq. I would like to keep the executor code out of the Container Manager so that 
the executor thread can be re-used.

> Datanodes are going to dead state after some interval
> -----------------------------------------------------
>
>                 Key: HDDS-728
>                 URL: https://issues.apache.org/jira/browse/HDDS-728
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Filesystem
>    Affects Versions: 0.3.0
>            Reporter: Soumitra Sulav
>            Assignee: Mukul Kumar Singh
>            Priority: Major
>         Attachments: HDDS-728.001.patch, HDDS-728.002.patch, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-000002.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-000003.hwx.site.log, 
> hadoop-root-om-ctr-e138-1518143905142-541600-02-000002.hwx.site.log, 
> hadoop-root-scm-ctr-e138-1518143905142-541600-02-000002.hwx.site.log, 
> om-audit-ctr-e138-1518143905142-541600-02-000002.hwx.site.log
>
>
> Setup a 5 datanode ozone cluster with HDP on top of it.
> After restarting all HDP services few times encountered below issue which is 
> making the HDP services to fail.
> Same exception was observed in an old setup but I thought it could have been 
> issue with the setup but now encountered the same issue in new setup as well.
> {code:java}
> 2018-10-24 10:42:03,308 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 10:42:03,342 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 7839294e-5657-447f-b320-6b390fffb963->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 10:42:04,466 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to