[
https://issues.apache.org/jira/browse/HDDS-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921441#comment-16921441
]
Shashikant Banerjee commented on HDDS-2076:
-------------------------------------------
>From the OM logs , container 7 exists on the pipeline
>c7731959-c6cd-4d98-aee2-a502f5963142
{code:java}
2019-08-30 12:44:16,772 | INFO | OMAudit | user=msingh | ip=127.0.0.1 |
op=COMMIT_KEY
{blockID={containerID=7, localID=102704691060015900}, length=16384, offset=0,
token=null, pipeline=Pipeline[ Id: c7731959-c6cd-4d98-aee2-a502f5963142, Nodes:
6412537d-892a-4a42-b29d-cc871e03e01f{ip: 192.168.0.103, host: 192.168.0.103,
networkLocation: /default-rack, certSerialId:
null}e309d977-2464-4a16-8e31-b9820831456f{ip: 192.168.0.103, host:
192.168.0.103, networkLocation: /default-rack, certSerialId:
null}35341ac8-c24f-4d2b-afd2-d8bd6936f0c4{ip: 192.168.0.103, host:
192.168.0.103, networkLocation: /default-rack, certSerialId: null}, Type:RATIS,
Factor:THREE, State:OPEN], createVersion=0}
2019-08-30 12:45:02,062 [RatisApplyTransactionExecutor 7] INFO
interfaces.Container (KeyValueContainer.java:flushAndSyncDB(382)) - Container 7
is synced with bcsId 3832.
2019-08-30 12:45:02,084 [RatisApplyTransactionExecutor 7] INFO
interfaces.Container (KeyValueContainer.java:flushAndSyncDB(382)) - Container 7
is synced with bcsId 3832.
2019-08-30 12:45:02,102 [RatisApplyTransactionExecutor 7] INFO
interfaces.Container (KeyValueContainer.java:flushAndSyncDB(382)) - Container 7
is synced with bcsId 3832.
{code}
But the actual read is happening after the container is closed and replicated
to dn 205d2fcf-b2d7-4823-8322-d725caa4f175 which is not part of the original
pipeline.
{code:java}
2019-08-30 12:51:19,501 [grpc-default-executor-18] INFO
keyvalue.KeyValueHandler (ContainerUtils.java:logAndReturnError(146)) -
Operation: GetBlock : Trace ID: 357bbecea87d4330:357bbecea87d4330:0:0 :
Message: Unable to find the block with bcsID 2515 .Container 7 bcsId is 0. :
Result: UNKNOWN_BCSID2019-08-30 12:51:19,501 [grpc-default-executor-18] INFO
keyvalue.KeyValueHandler (ContainerUtils.java:logAndReturnError(146)) -
Operation: GetBlock : Trace ID: 357bbecea87d4330:357bbecea87d4330:0:0 :
Message: Unable to find the block with bcsID 2515 .Container 7 bcsId is 0. :
Result: UNKNOWN_BCSID2019-08-30 12:51:19,501 | ERROR | DNAudit | user=null |
ip=null | op=GET_BLOCK {blockData=conID: 7 locID: 102704691125289777 bcsId:
2515} | ret=FAILURE | java.lang.Exception: Unable to find the block with bcsID
2515 .Container 7 bcsId is 0. at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
2019-08-30 12:51:19,501 [pool-224-thread-6] ERROR scm.XceiverClientGrpc
(XceiverClientGrpc.java:sendCommandWithRetry(293)) - Failed to execute command
cmdType: GetBlocktraceID: "357bbecea87d4330:357bbecea87d4330:0:0"containerID:
7datanodeUuid: "205d2fcf-b2d7-4823-8322-d725caa4f175"getBlock { blockID {
containerID: 7 localID: 102704691125289777 blockCommitSequenceId: 2515 }}
{code}
It implies that, after the close container completed followed by dn restarts,
replicaManager copied replica got corrupted and same got updated in SCM and a
read request when served from this replica failed.
Need to further investigate why the corruption is seen the replicated container.
> Read fails because the block cannot be located in the container
> ---------------------------------------------------------------
>
> Key: HDDS-2076
> URL: https://issues.apache.org/jira/browse/HDDS-2076
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Client, Ozone Datanode
> Affects Versions: 0.4.0
> Reporter: Mukul Kumar Singh
> Assignee: Shashikant Banerjee
> Priority: Blocker
> Labels: MiniOzoneChaosCluster
> Attachments: log.zip
>
>
> Read fails as the client is not able to read the block from the container.
> {code}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> Unable to find the block with bcsID 2515 .Container 7 bcsId is 0.
> at
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
> at
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambd2a0$getValid1a9to-08-30
> 12:51:20,081 | INFO | SCMAudit | user=msingh | ip=192.168.0.r103
> |List$0(ContainerP
> rotocolCalls.java:569)
> {code}
> The client eventually exits here
> {code}
> 2019-08-30 12:51:20,081 [pool-224-thread-6] ERROR
> ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:readData(176)) -
> LOADGEN: Read key:pool-224-thread-6_330651 failed with ex
> ception
> ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(121)) -
> LOADGEN: Exiting due to exception
> {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]