[
https://issues.apache.org/jira/browse/HDDS-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krishna Kumar Asawa reassigned HDDS-14761:
------------------------------------------
Assignee: Sadanand Shenoy
> EC replica read fails atmost once with GetBlock error after offline
> reconstruction
> ----------------------------------------------------------------------------------
>
> Key: HDDS-14761
> URL: https://issues.apache.org/jira/browse/HDDS-14761
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM
> Affects Versions: 2.1.0
> Reporter: Soumitra Sulav
> Assignee: Sadanand Shenoy
> Priority: Major
>
>
> EC replica read fails with GetBlock error after offline reconstruction on
> using cache in SCMClient.
> Steps done :
> # Created an EC key with {{{}rs-3-2-1024k{}}}.
> # Run key info/get to cache the container replicas info in the SCM Client.
> # Ensured that the calls were not gooing to SCM and read from cache by
> looking at SCM audit read logs.
> # Brought down one of the replica and waited for EC offline reconstruction.
> # Ran the key get again, it failed with exception at GetBlock.
> # Observed a SCM call {{op=GET_CONTAINER_WITH_PIPELINE_BATCH}} which
> indicates the cache is refreshed.
> # On next get key command, it works.
> *CLI commands :*
> {code:java}
> # ozone admin container info 1003
> Container id: 1003
> Pipeline id: 716aecb6-f1fc-4e50-b603-c547177adb7c
> Container State: CLOSED
> Datanodes: [7752d3a1-06b3-42fb-bdf7-2886d90b4bdb/node-5.vpc.domain.com,
> 5d343a9a-5649-44a7-80f1-9c76f16d611e/node-4.vpc.domain.com,
> eba28c30-aba2-4f9c-99b2-86fbb4828cae/node-7.vpc.domain.com,
> f6b6e9e8-7ca7-4612-9a1c-ad1a9477c7e4/node-3.vpc.domain.com,
> 43ee72fb-a6c2-4e8c-a36e-8613d8e0fd38/node-6.vpc.domain.com]
> # shutdown replica node-3.vpc.domain.com
> # ozone admin container info 1003
> Container id: 1003
> Pipeline id: 61dd8be1-5959-4c06-8fb8-79db365b7682
> Container State: CLOSED
> Datanodes: [7752d3a1-06b3-42fb-bdf7-2886d90b4bdb/node-5.vpc.domain.com,
> 5d343a9a-5649-44a7-80f1-9c76f16d611e/node-4.vpc.domain.com,
> eba28c30-aba2-4f9c-99b2-86fbb4828cae/node-7.vpc.domain.com,
> 43ee72fb-a6c2-4e8c-a36e-8613d8e0fd38/node-6.vpc.domain.com]
> # EC offline reconstruction happens on node-1.vpc.domain.com
> # ozone admin container info 1003
> Container id: 1003
> Pipeline id: 5a21c624-ada4-4560-bd00-7eb897488e54
> Container State: CLOSED
> Datanodes: [7752d3a1-06b3-42fb-bdf7-2886d90b4bdb/node-5.vpc.domain.com,
> 5d343a9a-5649-44a7-80f1-9c76f16d611e/node-4.vpc.domain.com,
> b3984801-0289-473a-b1be-3b3f10a27d83/node-1.vpc.domain.com,
> eba28c30-aba2-4f9c-99b2-86fbb4828cae/node-7.vpc.domain.com,
> 43ee72fb-a6c2-4e8c-a36e-8613d8e0fd38/node-6.vpc.domain.com]
> # ozone sh key get vol-scmfail-k7k14/buck-scmfail-k7k14/key_5mb_ec_1
> download_key_5mb_ec_17
> 26/03/03 08:08:43 ERROR scm.XceiverClientGrpc: Failed to execute command
> GetBlock on the pipeline Pipeline[ Id: f6b6e9e8-7ca7-4612-9a1c-ad1a9477c7e4,
> Nodes:
> f6b6e9e8-7ca7-4612-9a1c-ad1a9477c7e4(node-3.vpc.domain.com/10.65.8.219)
> ReplicaIndex: 3, ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:,
> CreationTimestamp2026-03-03T08:08:43.740-08:00[America/Los_Angeles]].
> 26/03/03 08:08:43 INFO storage.BlockInputStream: Unable to read information
> for block conID: 1003 locID: 117883640217601008 bcsId: 0 replicaIndex: null
> from pipeline PipelineID=f6b6e9e8-7ca7-4612-9a1c-ad1a9477c7e4:
> java.util.concurrent.ExecutionException:
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
> exception
> 2026-03-03 08:08:43,777 | INFO | SCMAudit |
> user=om/[email protected] | ip=10.65.14.58 |
> op=GET_CONTAINER_WITH_PIPELINE_BATCH {containerIDs=#1003,} | ret=SUCCESS |
> # Next key get works as cache is invalidated.{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]