[ 
https://issues.apache.org/jira/browse/HDDS-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921441#comment-16921441
 ] 

Shashikant Banerjee commented on HDDS-2076:
-------------------------------------------

>From the OM logs , container 7 exists on the pipeline 
>c7731959-c6cd-4d98-aee2-a502f5963142
{code:java}
2019-08-30 12:44:16,772 | INFO  | OMAudit | user=msingh | ip=127.0.0.1 | 
op=COMMIT_KEY

{blockID={containerID=7, localID=102704691060015900}, length=16384, offset=0, 
token=null, pipeline=Pipeline[ Id: c7731959-c6cd-4d98-aee2-a502f5963142, Nodes: 
6412537d-892a-4a42-b29d-cc871e03e01f{ip: 192.168.0.103, host: 192.168.0.103, 
networkLocation: /default-rack, certSerialId: 
null}e309d977-2464-4a16-8e31-b9820831456f{ip: 192.168.0.103, host: 
192.168.0.103, networkLocation: /default-rack, certSerialId: 
null}35341ac8-c24f-4d2b-afd2-d8bd6936f0c4{ip: 192.168.0.103, host: 
192.168.0.103, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, 
Factor:THREE, State:OPEN], createVersion=0}


2019-08-30 12:45:02,062 [RatisApplyTransactionExecutor 7] INFO  
interfaces.Container (KeyValueContainer.java:flushAndSyncDB(382)) - Container 7 
is synced with bcsId 3832.
2019-08-30 12:45:02,084 [RatisApplyTransactionExecutor 7] INFO  
interfaces.Container (KeyValueContainer.java:flushAndSyncDB(382)) - Container 7 
is synced with bcsId 3832.
2019-08-30 12:45:02,102 [RatisApplyTransactionExecutor 7] INFO  
interfaces.Container (KeyValueContainer.java:flushAndSyncDB(382)) - Container 7 
is synced with bcsId 3832.


{code}
But the actual read is happening after the container is closed and replicated 
to dn 205d2fcf-b2d7-4823-8322-d725caa4f175 which is not part of the original 
pipeline.
{code:java}
2019-08-30 12:51:19,501 [grpc-default-executor-18] INFO  
keyvalue.KeyValueHandler (ContainerUtils.java:logAndReturnError(146)) - 
Operation: GetBlock : Trace ID: 357bbecea87d4330:357bbecea87d4330:0:0 : 
Message: Unable to find the block with bcsID 2515 .Container 7 bcsId is 0. : 
Result: UNKNOWN_BCSID2019-08-30 12:51:19,501 [grpc-default-executor-18] INFO  
keyvalue.KeyValueHandler (ContainerUtils.java:logAndReturnError(146)) - 
Operation: GetBlock : Trace ID: 357bbecea87d4330:357bbecea87d4330:0:0 : 
Message: Unable to find the block with bcsID 2515 .Container 7 bcsId is 0. : 
Result: UNKNOWN_BCSID2019-08-30 12:51:19,501 | ERROR | DNAudit | user=null | 
ip=null | op=GET_BLOCK {blockData=conID: 7 locID: 102704691125289777 bcsId: 
2515} | ret=FAILURE | java.lang.Exception: Unable to find the block with bcsID 
2515 .Container 7 bcsId is 0. at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320)
 at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
 2019-08-30 12:51:19,501 [pool-224-thread-6] ERROR scm.XceiverClientGrpc 
(XceiverClientGrpc.java:sendCommandWithRetry(293)) - Failed to execute command 
cmdType: GetBlocktraceID: "357bbecea87d4330:357bbecea87d4330:0:0"containerID: 
7datanodeUuid: "205d2fcf-b2d7-4823-8322-d725caa4f175"getBlock {  blockID {    
containerID: 7    localID: 102704691125289777    blockCommitSequenceId: 2515  }}


{code}
It implies that, after the close container completed followed by dn restarts, 
replicaManager copied replica got corrupted and same got updated in SCM and a 
read request when served from this replica failed.

Need to further investigate why the corruption is seen the replicated container.

> Read fails because the block cannot be located in the container
> ---------------------------------------------------------------
>
>                 Key: HDDS-2076
>                 URL: https://issues.apache.org/jira/browse/HDDS-2076
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Client, Ozone Datanode
>    Affects Versions: 0.4.0
>            Reporter: Mukul Kumar Singh
>            Assignee: Shashikant Banerjee
>            Priority: Blocker
>              Labels: MiniOzoneChaosCluster
>         Attachments: log.zip
>
>
> Read fails as the client is not able to read the block from the container.
> {code}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the block with bcsID 2515 .Container 7 bcsId is 0.
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambd2a0$getValid1a9to-08-30
>  12:51:20,081 | INFO  | SCMAudit | user=msingh | ip=192.168.0.r103 
> |List$0(ContainerP
> rotocolCalls.java:569)
> {code}
> The client eventually exits here
> {code}
> 2019-08-30 12:51:20,081 [pool-224-thread-6] ERROR 
> ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:readData(176)) - 
> LOADGEN: Read key:pool-224-thread-6_330651 failed with ex
> ception
> ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(121)) - 
> LOADGEN: Exiting due to exception
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to