[
https://issues.apache.org/jira/browse/HDDS-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754805#comment-17754805
]
Siyao Meng commented on HDDS-9146:
----------------------------------
[~szetszwo] Yes it is a sneaky one. I suggest we check entries (blocks) added
to deletedTable from now on for hsync tests to prevent similar issues.
> Potential data loss with HSync due to deletedTable entry having the same
> block as keyTable entry's
> --------------------------------------------------------------------------------------------------
>
> Key: HDDS-9146
> URL: https://issues.apache.org/jira/browse/HDDS-9146
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Siyao Meng
> Assignee: Siyao Meng
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.4.0
>
>
> It is observed when {{hsync()}} is called followed by a {{close()}} for a key
> stream (which triggers two {{OMKeyCommitRequest}}, the first one with
> {{isHSync = true}} and the second one with {{isHSync = false}}),
> {{deletedTable}} could have an entry with the exact same block {{conID}}
> (container ID) and {{locId}} (local ID) as the committed key in {{keyTable}},
> which can cause OM's {{KeyDeletingService}} to call SCM to remove the
> committed block by mistake.
> The catch is, actual data loss won't happen until the container is closed,
> only then will block deletion actually happen on DNs. CMIIW [~erose]
> Repro integration test branch (based on [~erose]'s integration test based on
> my initial draft):
> https://github.com/smengcl/hadoop-ozone/tree/HDDS-9146-repro
> Run integration test {{TestMiniOzoneCluster#testKeyRenameDirDelete}} for a
> repro:
> {code:title=Test log. See entries in keyTable and deletedTable share the same
> block conID: 1 and locID: 111677748019200001}
> 2023-08-09 14:31:54,859 [main] WARN ozone.TestMiniOzoneCluster
> (TestMiniOzoneCluster.java:testKeyRenameDirDelete(159)) - keyTable: -----
> START -----
> 2023-08-09 14:31:54,860 [main] WARN ozone.TestMiniOzoneCluster
> (TestMiniOzoneCluster.java:testKeyRenameDirDelete(168)) - keyTable: key =
> /testozonevol/testozonebucket/inputTera/_temporary/1/_temporary/attempt_1691047336995_0006_m_000001_0/part-m-00001,
> val = OmKeyInfo{volumeName='testozonevol', bucketName='testozonebucket',
> keyName='inputTera/_temporary/1/_temporary/attempt_1691047336995_0006_m_000001_0/part-m-00001',
> dataSize=11, keyLocationVersions=[OmKeyLocationInfoGroup{version=0,
> locationVersionMap={0=[{blockID={conID: 1 locID: 111677748019200001 bcsId:
> 2}, length=11, offset=0, token=null, pipeline=null, createVersion=0,
> partNumber=0}]}, isMultipartKey=false}], creationTime=1691616714661,
> modificationTime=1691616714848, replicationConfig=RATIS/THREE, encInfo=null,
> fileChecksum=null, isFile=true, fileName='part-m-00001'}
> 2023-08-09 14:31:54,860 [main] WARN ozone.TestMiniOzoneCluster
> (TestMiniOzoneCluster.java:testKeyRenameDirDelete(171)) - keyTable: -----
> END -----
> 2023-08-09 14:31:54,860 [main] WARN ozone.TestMiniOzoneCluster
> (TestMiniOzoneCluster.java:testKeyRenameDirDelete(173)) - deletedTable: -----
> START -----
> 2023-08-09 14:31:54,861 [main] WARN ozone.TestMiniOzoneCluster
> (TestMiniOzoneCluster.java:testKeyRenameDirDelete(181)) - deletedTable: key =
> /testozonevol/testozonebucket/inputTera/_temporary/1/_temporary/attempt_1691047336995_0006_m_000001_0/part-m-00001/-9223372036854774528,
> val = RepeatedOmKeyInfo{omKeyInfoList=[OmKeyInfo{volumeName='testozonevol',
> bucketName='testozonebucket',
> keyName='inputTera/_temporary/1/_temporary/attempt_1691047336995_0006_m_000001_0/part-m-00001',
> dataSize=11, keyLocationVersions=[OmKeyLocationInfoGroup{version=0,
> locationVersionMap={0=[{blockID={conID: 1 locID: 111677748019200001 bcsId:
> 0}, length=11, offset=0, token=null, pipeline=null, createVersion=0,
> partNumber=0}]}, isMultipartKey=false}], creationTime=1691616714661,
> modificationTime=1691616714834, replicationConfig=RATIS/THREE, encInfo=null,
> fileChecksum=null, isFile=true, fileName='part-m-00001'}]}
> 2023-08-09 14:31:54,861 [main] WARN ozone.TestMiniOzoneCluster
> (TestMiniOzoneCluster.java:testKeyRenameDirDelete(184)) - deletedTable: -----
> END -----
> {code}
> Sounds to me the fix should be to filter out any block that shares the same
> containerId and locId as the keyTable/fileTable entry when adding to
> deletedTable inside OMKeyCommitRequest / OMKeyCommitRequestWithFSO. But I'm
> no expert in HSync so please advise. cc [~weichiu] [~szetszwo]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]