[
https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886070#comment-16886070
]
Rajeshbabu Chintaguntla edited comment on RATIS-556 at 7/16/19 12:21 PM:
-------------------------------------------------------------------------
{noformat}
Failed to read from log
org.apache.ratis.logservice.common.LogNotFoundException: 'testlog1'
at
org.apache.ratis.logservice.server.MetaStateMachine.processGetLogRequest(MetaStateMachine.java:382)
at
org.apache.ratis.logservice.server.MetaStateMachine.query(MetaStateMachine.java:213)
at
org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:547)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$7(RaftServerProxy.java:333)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$null$5(RaftServerProxy.java:328)
at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:109)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$6(RaftServerProxy.java:328)
at
java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
at
java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
at
org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:327)
at
org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:333)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:220)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$UnorderedRequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:276)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:240)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:168)
at
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
Sorry it's my mistake actually we should not give single quotes for log name
which is why the LogNotFoundException is coming. Actually the reads and writes
are working fine even when one of the node present in the group.
bq.At meta quorum, we can utilize PINGREQUEST & timeout to detect server
failure(though it's not yet implemented at workers to report liveliness) and
try changing the state of the log to CLOSE and start archiving.
At present ping request checks the peer present in the registered peers and
just add to the list and not doing any work to timeout. We need to provide new
API like REMOVEREQUEST to remove the peer from the list at meta and also need
to remove it from the groups created to handle the log as well as we need to
add new peer to the group so that data will be synced there itself we can close
the log to stop reading/writing to the log.
All this need to be done when append operations failed and notified to log
state machine.
Please let me know if it's a good plan. Just trying out.
was (Author: rajeshbabu):
{noformat}
Failed to read from log
org.apache.ratis.logservice.common.LogNotFoundException: 'testlog1'
at
org.apache.ratis.logservice.server.MetaStateMachine.processGetLogRequest(MetaStateMachine.java:382)
at
org.apache.ratis.logservice.server.MetaStateMachine.query(MetaStateMachine.java:213)
at
org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:547)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$7(RaftServerProxy.java:333)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$null$5(RaftServerProxy.java:328)
at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:109)
at
org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$6(RaftServerProxy.java:328)
at
java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
at
java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
at
org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:327)
at
org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:333)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:220)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$UnorderedRequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:276)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:240)
at
org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:168)
at
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
Sorry it's my mistake actually we should not give single quotes for log name
which is why the LogNotFoundException is coming. Actually the reads and writes
are working fine even when one of the node present in the group.
bq.At meta quorum, we can utilize PINGREQUEST & timeout to detect server
failure(though it's not yet implemented at workers to report liveliness) and
try changing the state of the log to CLOSE and start archiving.
At present ping request checks the peer present in the registered peers and
just add to the list and not doing any work to timeout. We need to provide new
API like REMOVEREQUEST to remove the peer from the list at meta and also need
to remove it from the groups created to handle the log as well as we need to
add new peer to the group so that data will be synced. All this need to be
done when append operations failed and notified to log state machine. Working
on this.
> Detect node failures and add other workers to group serving the log and
> replicate the data of the log
> -----------------------------------------------------------------------------------------------------
>
> Key: RATIS-556
> URL: https://issues.apache.org/jira/browse/RATIS-556
> Project: Ratis
> Issue Type: Improvement
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Major
>
> Currently there is no way to detect the node failures at master log servers
> and add new nodes to the group serving the log. We need to analyze how Ozone
> is working in this case.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)