ivandika3 commented on PR #4357:
URL: https://github.com/apache/ozone/pull/4357#issuecomment-1918804433
@szetszwo Sorry for the random comment, but from my understanding we already
have a logic to retry other datanodes in
`XceiverClientGrpc#sendCommandWithRetry`.
```java
for (DatanodeDetails dn : datanodeList) {
try {
if (LOG.isDebugEnabled()) {
LOG.debug("Executing command {} on datanode {}",
processForDebug(request), dn);
}
// In case the command gets retried on a 2nd datanode,
// sendCommandAsyncCall will create a new channel and async stub
// in case these don't exist for the specific datanode.
reply.addDatanode(dn);
responseProto = sendCommandAsync(request, dn).getResponse().get();
if (validators != null && !validators.isEmpty()) {
for (Validator validator : validators) {
validator.accept(request, responseProto);
}
}
...
```
The call trace is `ContainerProtocolCalls#getBlock` ->
`XceiverClientGrpc#sendCommand` ->
`XceiverClientGrpc#sendCommandWithTraceIDAndRetry` ->
`XceiverClientGrpc#sendCommandWithRetry`
This means that getBlock will be retried 27 times (3 from
`sendCommandWithRetry` assuming RATIS/THREE * 3 from the `tryEachDatanode` * 3
from `BlockInputStream`'s retry policy)? This is also applied to readChunk
https://github.com/apache/ozone/pull/4336.
Please correct me if I'm mistaken. Thanks in advance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]