[GitHub] [ozone] siddhantsangwan commented on pull request #3963: HDDS-6572. EC: ReplicationManager - add move manager for container move

GitBox Mon, 05 Dec 2022 03:32:27 -0800


siddhantsangwan commented on PR #3963:
URL: https://github.com/apache/ozone/pull/3963#issuecomment-1337180696


   > @siddhantsangwan can you explain why balancer need to replicate its 
command in more details. i think you are familiar with this part.
   
   Balancer's configurations are replicated so that they're available in the 
new SCM on failover. As for moves being replicated, I think the original intent 
was to provide consistency in scenarios such as - 
   Old leader scheduled a move from DN1 to DN2. Replication succeeded, and then 
there's a failover. How can we ensure that the new leader deletes replica from 
DN1 and not some other DN?
   
   Back then, the Ratis approach was selected to ensure consistency. But some 
other approaches were also considered. From the move design doc:
   
   > Approach 1.2
   Maintain a list of moves i.e. {cid, src_dn, target_dn}. Once the replica set 
for container cid has both src_dn and target_dn and the container is 
over-replicated, deletion can be scheduled for replica in src_dn. It would be 
better for RM to have a map called inflightMoves which tracks such move 
requests.
   In the worst case container can become under-replicated i.e. container could 
end up with 2 replicas instead of 3. Let’s say container c1 is being moved from 
d2 to d4. RM1 sees replicas
   {d1, d2, d3, d4} and schedules deletion for d2. RM2(new leader) also sees 
replicas {d1, d2, d3, d4} (deletion not yet completed) and schedules deletion 
for d1 based on a deterministic algo to handle over-replicated containers. In 
this case remaining replicas would be {d3, d4}.
   In case of no failovers, the move should be successful. On a failover move 
map must be cleared.
   
   I think @sodonnel is suggesting something similar here. We can avoid the 
worst case scenario if balancer waits (like RM) after a failover and datanodes 
drop commands from the old leader.
   
   
   > Additionally, if we changed the over replication logic, so that it prefers 
to delete a replica from a node with less free space than the others, then 
replication becomes somewhat self balancing, and perhaps the delete part of the 
balancer isn't need. As it stands we cannot make that over-replication 
optimization in SCM, as the delete order is deterministic.
   
   @sodonnel from the move doc:
   
   > There is a problem with deleting the replica from the most used datanode. 
Let’s consider a scenario - container c1 and c2 with replicas in {d1, d2, d3}. 
In terms of utilisation d1 > d2 > d3. Balancer schedules move c1 from d1 to d4 
and c2 from d2 to d4 given there is a limit of 1 on maximum moves from a source 
datanode. In the Replication manager when copy is complete c1 and c2 replicas 
will be deleted from d1 only. Expected behavior is for c1 to be deleted from d1 
and c2 to be deleted from d2.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] siddhantsangwan commented on pull request #3963: HDDS-6572. EC: ReplicationManager - add move manager for container move

Reply via email to