Ashish Kumar created HDDS-11714:
-----------------------------------
Summary: resetDeletedBlockRetryCount with --all will fail and can
cause long db lock in large cluster
Key: HDDS-11714
URL: https://issues.apache.org/jira/browse/HDDS-11714
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Ashish Kumar
In case of resetDeletedBlockRetryCount with --all option, scm takes
[lock|https://github.com/apache/ozone/blob/12419fae1f0418793d952227364b04f1d2c3583b/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L126]
and tries to get all the transaction with max retry and then updates DB with 0
count. In some large scale env this count can be huge which can lead to
multiple problem.
i) Lock can lead to block all other normal operation.
ii) Since message is passed through ratis, which will fail because of size.
Instead of doing like above we should do this operation in batche to avoid long
lock and ratis message size failure.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]