Soumitra Sulav created HDDS-14761:
-------------------------------------

             Summary: EC replica read fails atmost once with GetBlock error 
after offline reconstruction
                 Key: HDDS-14761
                 URL: https://issues.apache.org/jira/browse/HDDS-14761
             Project: Apache Ozone
          Issue Type: Bug
          Components: OM
    Affects Versions: 2.1.0
            Reporter: Soumitra Sulav


 

EC replica read fails with GetBlock error after offline reconstruction on using 
cache in SCMClient.

Steps done :
 # Created an EC key with {{{}rs-3-2-1024k{}}}.
 # Run key info/get to cache the container replicas info in the SCM Client.
 # Ensured that the calls were not gooing to SCM and read from cache by looking 
at SCM audit read logs.
 # Brought down one of the replica and waited for EC offline reconstruction.
 # Ran the key get again, it failed with exception at GetBlock.
 # Observed a SCM call {{op=GET_CONTAINER_WITH_PIPELINE_BATCH}} which indicates 
the cache is refreshed.
 # On next get key command, it works.

*CLI commands :*
{code:java}
# ozone admin container info 1003
Container id: 1003
Pipeline id: 716aecb6-f1fc-4e50-b603-c547177adb7c
Container State: CLOSED
Datanodes: [7752d3a1-06b3-42fb-bdf7-2886d90b4bdb/node-5.vpc.domain.com,
5d343a9a-5649-44a7-80f1-9c76f16d611e/node-4.vpc.domain.com,
eba28c30-aba2-4f9c-99b2-86fbb4828cae/node-7.vpc.domain.com,
f6b6e9e8-7ca7-4612-9a1c-ad1a9477c7e4/node-3.vpc.domain.com,
43ee72fb-a6c2-4e8c-a36e-8613d8e0fd38/node-6.vpc.domain.com]

# shutdown replica node-3.vpc.domain.com

# ozone admin container info 1003
Container id: 1003
Pipeline id: 61dd8be1-5959-4c06-8fb8-79db365b7682
Container State: CLOSED
Datanodes: [7752d3a1-06b3-42fb-bdf7-2886d90b4bdb/node-5.vpc.domain.com,
5d343a9a-5649-44a7-80f1-9c76f16d611e/node-4.vpc.domain.com,
eba28c30-aba2-4f9c-99b2-86fbb4828cae/node-7.vpc.domain.com,
43ee72fb-a6c2-4e8c-a36e-8613d8e0fd38/node-6.vpc.domain.com]

# EC offline reconstruction happens on node-1.vpc.domain.com

# ozone admin container info 1003
Container id: 1003
Pipeline id: 5a21c624-ada4-4560-bd00-7eb897488e54
Container State: CLOSED
Datanodes: [7752d3a1-06b3-42fb-bdf7-2886d90b4bdb/node-5.vpc.domain.com,
5d343a9a-5649-44a7-80f1-9c76f16d611e/node-4.vpc.domain.com,
b3984801-0289-473a-b1be-3b3f10a27d83/node-1.vpc.domain.com,
eba28c30-aba2-4f9c-99b2-86fbb4828cae/node-7.vpc.domain.com,
43ee72fb-a6c2-4e8c-a36e-8613d8e0fd38/node-6.vpc.domain.com]

# ozone sh key get vol-scmfail-k7k14/buck-scmfail-k7k14/key_5mb_ec_1 
download_key_5mb_ec_17

26/03/03 08:08:43 ERROR scm.XceiverClientGrpc: Failed to execute command 
GetBlock on the pipeline Pipeline[ Id: f6b6e9e8-7ca7-4612-9a1c-ad1a9477c7e4, 
Nodes: f6b6e9e8-7ca7-4612-9a1c-ad1a9477c7e4(node-3.vpc.domain.com/10.65.8.219) 
ReplicaIndex: 3, ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
CreationTimestamp2026-03-03T08:08:43.740-08:00[America/Los_Angeles]].
26/03/03 08:08:43 INFO storage.BlockInputStream: Unable to read information for 
block conID: 1003 locID: 117883640217601008 bcsId: 0 replicaIndex: null from 
pipeline PipelineID=f6b6e9e8-7ca7-4612-9a1c-ad1a9477c7e4: 
java.util.concurrent.ExecutionException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
2026-03-03 08:08:43,777 | INFO  | SCMAudit | 
user=om/[email protected] | ip=10.65.14.58 | 
op=GET_CONTAINER_WITH_PIPELINE_BATCH {containerIDs=#1003,} | ret=SUCCESS |

# Next key get works as cache is invalidated.{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to