ivankelly commented on a change in pull request #847: BP-23: Ledger Balancer 
(WIP)
URL: https://github.com/apache/bookkeeper/pull/847#discussion_r159519418
 
 

 ##########
 File path: site/bps/BP-23-ledger-rebalancer.md
 ##########
 @@ -0,0 +1,50 @@
+---
+title: "BP-23: ledger balancer"
+issue: https://github.com/apache/bookkeeper/846
+state: "WIP" 
+release: "x.y.z"
+---
+
+### Motivation
+
+There are typical two use cases of _Apache BookKeeper_, one is 
*Messaging/Streaming/Logging* style use cases, the other one is *Storage* style 
use cases.
+
+In Messaging/Streaming/Logging oriented use case (where old ledgers/segments 
are most likely will be deleted at some point), we don't actually need to 
rebalance the ledgers stored on bookies.
+
+However,
+In Storage oriented use cases (where data most likely will never be deleted), 
BookKeeper data might not always be placed uniformly across bookies. One common 
reason is addition of new bookies to an existing cluster. This proposal is 
proposing to provide a balancer mechanism (as an utility, also as part of 
AutoRecovery daemon), that analyzes ledger distributions and balances ledgers 
across bookies.
+
+It replicated ledgers to new bookies (based on resource-aware placement 
policies) until the cluster is deemed to be balanced, which means that disk 
utilization of every bookie (ratio of used space on the node to the capacity of 
the node) differs from the utilization of the cluster (ratio of used space on 
the cluster to total capacity of the cluster) by no more than a given threshold 
percentage.
 
 Review comment:
   Currently in rereplication, we only replicate a fragment. I.e. if striping 
is used only one stripe is rereplicated. This doesn't make a huge amount of 
sense. We stripe to allow an increased throughput. At the rereplication stage, 
throughput isn't a concern. 
   
   Perhaps for rereplication and for this balancing, instead of copying by 
fragment, we should just copy the whole ledger, 0->lac.
   
   Also, maybe we shouldn't modify the ensemble when we do this. The ensemble 
is overloaded right now, as a list of who can vote on an entry, and a pointer 
to location of an entry. Once the entry has been acknowledged to a user, who 
voted on that entry can't be changed, yet we do change it. Perhaps we should 
have a shadow ensemble, which is updated only when a fragment is rereplicated 
or rebalanced. This would greatly simply ensemble change for the writer, as the 
ensemble would never change unless the ledger is being recovered, and in that 
case the ledger would be fenced anyhow, so the write could give up. Only 
recovery would have to deal with changing ensembles.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to