[
https://issues.apache.org/jira/browse/HDDS-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787370#comment-17787370
]
Uma Maheswara Rao G commented on HDDS-9707:
-------------------------------------------
[~georgeJahad]
I was discussing with [~duong] with the following cases:
Let’s say SCM has a pipeline with nodes: DN1, DN2, DN3
at time t1: all nodes went down.
at time t2: OM cached pipeline has [] no nodes
at time t3: all DNs came back. SCM has pipeline with [DN1, DN2, DN3]IF client
reads now, since OM cache has empty pipeline, client will just fail I guess.
Are we periodically getting the pipelines and caching or we are building the
cache as we get the pipelines from SCM ?
Which seems to be the case.
A quick fix could be potentially to eliminate caching entries if the pipeline
node set is empty?
[~duong] thoughts?
> Intermittent NO_REPLICA_FOUND errors caused by OM pipeline cache
> ----------------------------------------------------------------
>
> Key: HDDS-9707
> URL: https://issues.apache.org/jira/browse/HDDS-9707
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: George Jahad
> Priority: Major
>
> We are currently seeing a problem in one of our clusters where we are getting
> NO_REPLICA_FOUND errors on some reads for data which is there. The problem
> goes away if we restart the cluster.
>
> We know it is not a networking problem because we can ssh into the datanodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]