JacksonYao287 commented on PR #3963:
URL: https://github.com/apache/ozone/pull/3963#issuecomment-1318032210

   @siddhantsangwan thanks for the comments.
   IMHO, container balancer, as a tool of optimizing the data distribution of 
the whole cluster, should be run when the cluster is not busy with workload. 
although we can run it anytime, but running a balancer in a busy cluster is not 
recommended. if we have to run it at busy time, we can set some parameters(for 
example, size.moved.max.per.iteration) to limit the influence it will bring to 
the busy cluster. so in majority of cases, for a good cluster administrator, 
running the balancer + replication manager together will not be a big problem.
   
   >What limits to we have on the balancer to stop too many commands getting 
scheduled on an under-used datanode? 
   
   seems we do not have any limit about this for now. we can add a configure 
item, maybe `max.replication.command.sent.to.one.target.per.iteration`, to 
limit the count of replicaiton command sent to a single target datanode in one 
iteration.  
   
   >I understand that is needed, but I wonder if we could re-use the 
replication manager over-replicated-processor to do this instead. 
   
   i get your point here that we`d better let all the command sent from a 
single place, and this will also share the traffic control of RM.
   there are two problems i can think up for now:
   1 RM is a background service and run periodically , but movemanger will take 
action as soon as it is notified some op is commpleted. i am not quite sure if 
RM is sleeping, is it correct to call 
`ReplicationManager.processOverReplicated(...)` 
   
   2 in majority cases, if a replication command is completed, the container is 
overreplicated and movemanager will delete the replica on source datanode. but 
for RM, it does not konw whether the container is being moved. when RM found 
this container is overreplicated, it may delete the replica in target datanode, 
which is not we want.
   
   
   >I also had a quick scan of the balancer config - is it not topology aware 
by default?
   for now, we have two `FindTargetStrategy`, `FindTargetStrategyByUsageInfo` 
and `FindTargetStrategyByNetworkTopology`. in some case , when choosing a 
target , the two strategy may conflict. for example , a datanode may have a 
lower disk usage, but have a farer distance from a given source datanode. for 
now , we use
    `FindTargetStrategyByUsageInfo` by default. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to