[
https://issues.apache.org/jira/browse/HDDS-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788126#comment-17788126
]
Uma Maheswara Rao G commented on HDDS-9709:
-------------------------------------------
> The fundamental difference lies in the client-side retry policy. Should the
> Ozone client (BlockInputStream) retry when NO_REPLICA_FOUND with a
> force-cache-refresh, or it should fail fast?
I think client can stay with the same behavior as failing when it see empty
list as long as server takes care of empty dns caching problem.
I see your patch handling at OM to make sure not to cache empty dns list. If
SCM gives empty list, that's a true empty list and retry may not help. We
cannot support the cases where all 3 dns failed removed from SCM.
> NO_REPLICA_FOUND should trigger a OM pipeline cache refresh
> -----------------------------------------------------------
>
> Key: HDDS-9709
> URL: https://issues.apache.org/jira/browse/HDDS-9709
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Duong
> Priority: Major
> Labels: pull-request-available
>
> Today, container pipelines are cached in OM and the cache data consistency is
> eventually ensured by client behavior. This means, if a container is
> replicated to another set of datanodes, the client detects this change when
> using the outdated cached pipeline to read data from datanodes and requests
> OM to refresh the pipeline cache from SCM.
> When the datanodes belonging to a container go offline, there are chances
> that an empty pipeline could be cached in OM. However, when client get an
> empty pipeline, it fails to ask OM to refresh the pipeline.
> {code:java}
> Caused by: java.lang.IllegalArgumentException: NO_REPLICA_FOUND
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
> at
> org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:164)
> at
> org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:157)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.acquireClient(BlockInputStream.java:285)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:238)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]