sodonnel commented on PR #3963:
URL: https://github.com/apache/ozone/pull/3963#issuecomment-1317260916
On interesting observation I have with this is related to throttling.
One goal of the new RM is to hold back work in SCM and avoid overloading the
datanodes. We are going to do that by tracking the number of queued commands
and the number pending on the DN on each heartbeat and avoid scheduling too
many commands for each datanode.
The balancer is going to schedule replicate and delete commands too, which
will use up some (or all) of the replicationManagers limit. That is not a
problem itself, although it could mean that running the balancer + replication
manager together will tend to slow down replication.
What limits to we have on the balancer to stop too many commands getting
scheduled on an under-used datanode? Eg if there are only a few "empty" nodes
in a large cluster, then what is the max number of replication commands they
can received in what time period? I'd like to make sure that there are good
limits in the balancer to prevent it queuing too much work on the datanodes.
If I remember correctly, a significant complexity in the balancer is that it
replicates its pending moves across the SCMs, and part of the reason for this
is that a deep queue on the DN could result in stale balance commands
continuing to run on the DNs, and the new leader SCM is not aware of them. If
the queue on the DNs is kept under control, then perhaps that replication is
not needed.
I also have a bit of a concern about the balancer needing to do some over
replication checks around the container it intends to delete. I understand that
is needed, but I wonder if we could re-use the replication manager
over-replicated-processor to do this instead. Eg, MoveManager calls into
ReplicationManager.processOverReplicated(...) and we also pass in the replica
we would like removed - it will check if it is possible or not and then
schedule the delete command.
I also had a quick scan of the balancer config - is it not topology aware by
default?
```
@Config(key = "move.networkTopology.enable", type = ConfigType.BOOLEAN,
defaultValue = "false", tags = {ConfigTag.BALANCER},
description = "whether to take network topology into account when " +
"selecting a target for a source. " +
"This configuration is false by default.")
private boolean networkTopologyEnable = false;
```
Why is this? Surely we will have a lot of "failed moves" that don't pass the
placement check, certainly for EC when the number of replicas is larger.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]