[
https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886070#comment-16886070
]
Rajeshbabu Chintaguntla commented on RATIS-556:
-----------------------------------------------
{noformat}
Failed to read from log
org.apache.ratis.logservice.common.LogNotFoundException: 'testlog1'
at
org.apache.ratis.logservice.server.MetaStateMachine.processGetLogRequest(MetaStateMachine.java:382)
at
org.apache.ratis.logservice.server.MetaStateMachine.query(MetaStateMachine.java:213)
at
org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:547)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$7(RaftServerProxy.java:333)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$null$5(RaftServerProxy.java:328)
at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:109)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$6(RaftServerProxy.java:328)
at
java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
at
java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
at
org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:327)
at
org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:333)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:220)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$UnorderedRequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:276)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:240)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:168)
at
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
Sorry it's my mistake actually we should not give single quotes for log name
which is why the LogNotFoundException is coming. Actually the reads and writes
are working fine even when one of the node present in the group.
bq.At meta quorum, we can utilize PINGREQUEST & timeout to detect server
failure(though it's not yet implemented at workers to report liveliness) and
try changing the state of the log to CLOSE and start archiving.
At present ping request checks the peer present int he registered peers and
just add to the list and not doing any work to timeout. We need to provide new
API like REMOVEREQUEST to remove the peer from the list and also remove it from
the groups created to handle the log and also we need to add new peer to the
group so that data will be synced. All this need to be done when append
operations failed and notified to log state machine. Working on this.
> Detect node failures and add other workers to group serving the log and
> replicate the data of the log
> -----------------------------------------------------------------------------------------------------
>
> Key: RATIS-556
> URL: https://issues.apache.org/jira/browse/RATIS-556
> Project: Ratis
> Issue Type: Improvement
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Major
>
> Currently there is no way to detect the node failures at master log servers
> and add new nodes to the group serving the log. We need to analyze how Ozone
> is working in this case.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)