[ 
https://issues.apache.org/jira/browse/HDDS-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787370#comment-17787370
 ] 

Uma Maheswara Rao G commented on HDDS-9707:
-------------------------------------------

[~georgeJahad] 
I was discussing with [~duong] with the following cases:
Let’s say SCM has a pipeline with nodes: DN1, DN2, DN3
at time t1: all nodes went down.
at time t2: OM cached pipeline has [] no nodes
at time t3: all DNs came back. SCM has pipeline with [DN1, DN2, DN3]IF client 
reads now, since OM cache has empty pipeline, client will just fail I guess.
Are we periodically getting the pipelines and caching or we are building the 
cache as we get the pipelines from SCM ?

Which seems to be the case.

A quick fix could be potentially to eliminate caching entries if the pipeline 
node set is empty?

[~duong] thoughts?

 

> Intermittent NO_REPLICA_FOUND errors caused by OM pipeline cache
> ----------------------------------------------------------------
>
>                 Key: HDDS-9707
>                 URL: https://issues.apache.org/jira/browse/HDDS-9707
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: George Jahad
>            Priority: Major
>
> We are currently seeing a problem in one of our clusters where we are getting 
> NO_REPLICA_FOUND errors on some reads for data which is there.  The problem 
> goes away if we restart the cluster.
>  
> We know it is not a networking problem because we can ssh into the datanodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to