ivankelly commented on a change in pull request #847: BP-23: Ledger Balancer 
(WIP)
URL: https://github.com/apache/bookkeeper/pull/847#discussion_r159609443
 
 

 ##########
 File path: site/bps/BP-23-ledger-rebalancer.md
 ##########
 @@ -0,0 +1,50 @@
+---
+title: "BP-23: ledger balancer"
+issue: https://github.com/apache/bookkeeper/846
+state: "WIP" 
+release: "x.y.z"
+---
+
+### Motivation
+
+There are typical two use cases of _Apache BookKeeper_, one is 
*Messaging/Streaming/Logging* style use cases, the other one is *Storage* style 
use cases.
+
+In Messaging/Streaming/Logging oriented use case (where old ledgers/segments 
are most likely will be deleted at some point), we don't actually need to 
rebalance the ledgers stored on bookies.
+
+However,
+In Storage oriented use cases (where data most likely will never be deleted), 
BookKeeper data might not always be placed uniformly across bookies. One common 
reason is addition of new bookies to an existing cluster. This proposal is 
proposing to provide a balancer mechanism (as an utility, also as part of 
AutoRecovery daemon), that analyzes ledger distributions and balances ledgers 
across bookies.
+
+It replicated ledgers to new bookies (based on resource-aware placement 
policies) until the cluster is deemed to be balanced, which means that disk 
utilization of every bookie (ratio of used space on the node to the capacity of 
the node) differs from the utilization of the cluster (ratio of used space on 
the cluster to total capacity of the cluster) by no more than a given threshold 
percentage.
 
 Review comment:
   > I am not sure I understand your comment here. But in current 
implementation, only write failures on normal writes and ledger recovery deal 
with ensemble changes. rereplication doesn't change ensemble. The ensemble is 
only updated when a fragment is fully replicated.
   
   With the current implementation, if a write needs to change the ensemble, 
and a previous fragment has been rereplicated, the change ensemble update will 
fail, the client needs to read back to check if the ledger is fenced and merge 
the ensembles. This merging is messy, which is why I'm suggesting we get rid of 
it.
   
   With a shadow ensemble for replication, the writer changing ensemble can 
assume that the ensemble it is trying to write is correct, so if the write 
fails, it only checks if there is fencing, and if not retries the write with 
the assumed to be correct ensemble. Similarly for multiple recovers, if they 
have to change the ensemble [1], each one will be able to ensure that all 
necessary entries are on all nodes in their presumably correct ensemble, so if 
they fail to write, they can just try again (while checking the lastEntry in 
the metadata). Again, merging is unnecessary.
   
   [1] Even then, we could get rid of ensemble change in recovery
   
   > I am not sure it worths changing the copying mechanism here. because 
rereplication and balancing would only deal with sealed fragments. I don't 
think there are issues with current copying mechanism, unless I missed anything.
   
   Ya, not worth changing now. If we were to start again though, I think I'd do 
it this way. That said, I'd probably get rid of ensemble changes too, and push 
that up to the higher level client

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to