[GitHub] [ozone] ashishkumar50 commented on pull request #5293: HDDS-4368. SCM should avoid sending delete transactions for under-replicated containers

via GitHub Thu, 14 Sep 2023 07:40:41 -0700


ashishkumar50 commented on PR #5293:
URL: https://github.com/apache/ozone/pull/5293#issuecomment-1719589461

> It feels like we should be storing the DNs we sent the commands to, and
checking off the replies against what we sent, rather than expecting replies
from "all the current replica DNs", as the current replicas could have changed
between when the command was issued and when the replies were received.

Already SCM stores in memory to which DN command is already sent and which
DN has already responded, to avoid retry on the successful responding DN.
And this
[memory](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L423)(in
map) is kept until all the replica has responded, because after this only we
can remove deleteTransactionID from rocksdb.
If the current replica has changed after sending command, then in next retry
it will send delete request to changed replica DN.

> I think there should be a timeout, after which it discards the pending
commands, and sends again to the current replica list and waits for a reply
there. What do you think?

In every retry it takes replica from
[containerManager](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L271)
and send out the delete request to DN. So always it is current replica.

> What about, we have 3 replicas. We send 3 deletes, then one node goes down
making it under replicated. We never get a reply from that node. Replication
makes a 3rd copy before checking. This was not under-replicated at the start
and the change here would not help I think.

This behaviour remains same before and after this change. During retry it
will send delete request to the new replica.

### Current behaviour:
Assume replication factor:3
If currently 2 replica present then send delete to 2 DNs. Entry remains in
memory([transactionToDNsCommitMap](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L282))
as it requires 3 response.
After certain try if another replica comes up, then send delete to the third
replica DN.
Remove entry from memory and then remove from rocksdb.

### Behaviour after this PR:
Assume replication factor:3
If currently 1-2 replica present then wait until it reaches replication
factor number of replica before sending delete.
Once replication factor satisfies send delete to all the 3 DNs.
This avoids keeping track of
[transactionToDNsCommitMap](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L282)
for long time in most of the cases.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] ashishkumar50 commented on pull request #5293: HDDS-4368. SCM should avoid sending delete transactions for under-replicated containers

Reply via email to