sodonnel commented on PR #3963:
URL: https://github.com/apache/ozone/pull/3963#issuecomment-1317260916

   On interesting observation I have with this is related to throttling.
   
   One goal of the new RM is to hold back work in SCM and avoid overloading the 
datanodes. We are going to do that by tracking the number of queued commands 
and the number pending on the DN on each heartbeat and avoid scheduling too 
many commands for each datanode.
   
   The balancer is going to schedule replicate and delete commands too, which 
will use up some (or all) of the replicationManagers limit. That is not a 
problem itself, although it could mean that running the balancer + replication 
manager together will tend to slow down replication.
   
   What limits to we have on the balancer to stop too many commands getting 
scheduled on an under-used datanode? Eg if there are only a few "empty" nodes 
in a large cluster, then what is the max number of replication commands they 
can received in what time period? I'd like to make sure that there are good 
limits in the balancer to prevent it queuing too much work on the datanodes.
   
   If I remember correctly, a significant complexity in the balancer is that it 
replicates its pending moves across the SCMs, and part of the reason for this 
is that a deep queue on the DN could result in stale balance commands 
continuing to run on the DNs, and the new leader SCM is not aware of them. If 
the queue on the DNs is kept under control, then perhaps that replication is 
not needed.
   
   I also have a bit of a concern about the balancer needing to do some over 
replication checks around the container it intends to delete. I understand that 
is needed, but I wonder if we could re-use the replication manager 
over-replicated-processor to do this instead. Eg, MoveManager calls into 
ReplicationManager.processOverReplicated(...) and we also pass in the replica 
we would like removed - it will check if it is possible or not and then 
schedule the delete command.
   
   I also had a quick scan of the balancer config - is it not topology aware by 
default?
   
   ```
     @Config(key = "move.networkTopology.enable", type = ConfigType.BOOLEAN,
         defaultValue = "false", tags = {ConfigTag.BALANCER},
         description = "whether to take network topology into account when " +
             "selecting a target for a source. " +
             "This configuration is false by default.")
     private boolean networkTopologyEnable = false;
   ```
   
   Why is this? Surely we will have a lot of "failed moves" that don't pass the 
placement check, certainly for EC when the number of replicas is larger.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to