[jira] [Created] (HDDS-11714) resetDeletedBlockRetryCount with --all will fail and can cause long db lock in large cluster

Ashish Kumar (Jira) Fri, 15 Nov 2024 00:51:24 -0800

Ashish Kumar created HDDS-11714:
-----------------------------------

             Summary: resetDeletedBlockRetryCount with --all will fail and can 
cause long db lock in large cluster
                 Key: HDDS-11714
                 URL: https://issues.apache.org/jira/browse/HDDS-11714
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Ashish Kumar



In case of resetDeletedBlockRetryCount with --all option, scm takes 
[lock|https://github.com/apache/ozone/blob/12419fae1f0418793d952227364b04f1d2c3583b/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L126]
 and tries to get all the transaction with max retry and then updates DB with 0 
count. In some large scale env this count can be huge which can lead to 
multiple problem.

i) Lock can lead to block all other normal operation.

ii) Since message is passed through ratis, which will fail because of size.

Instead of doing like above we should do this operation in batche to avoid long 
lock and ratis message size failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDDS-11714) resetDeletedBlockRetryCount with --all will fail and can cause long db lock in large cluster

Reply via email to