Swapnil,
Interesting points, here are my thoughts:

> Given that there is already support for parallel rebalancing among regions,
> I do not see the value in supporting parallel rebalancing of buckets.

I think there are some advantages to parallel rebalance of a region over
rebalancing regions in parallel. The first is that the existing system does
not help users with a single region that is significantly larger than
others. The second is that the existing system could cause memory overage
if several regions are rebalanced in parallel, since they are not
coordinated and would suffer from the issue discussed in the proposal about
members receiving and giving up buckets at the same time. The third is that
specifying the number of regions to rebalance does not spread the load
evenly across the cluster, but can hotspot specific members (for the reason
discussed in the previous point). By specifying the maximum number of
bucket transfers a member should concurrently be involved with, you can
specifically tune the overhead that rebalance can cause while at the same
time maximizing the number of transfers occurring at a time, especially on
larger clusters.

If we end up doing this anyway, I would suggest to not rely on "parameters"
> for throttling, as these parameters would have to be configured in advance
> without knowing what the actual load looks like when rebalance is in
> progress and hence could be difficult to get right. It would be ideal if we
> can handle this using back pressure.
>
Any back pressure mechanism still requires parameterization on what the
maximum resources are. For example, back pressure on adding tasks to a
thread pool executor is based on the number of allowed threads and the
maximum backlog. Based on Mike's anecdote, users will likely have their of
definition of too much resources, such as the hit they take to either
throughput or latency during the rebalance. I think the right approach is
to provide a good (but conservative) default and let users increase the
resource usage until they reach their optimal situation. This means we
should allow the level of parallelization to be configurable at runtime and
likely also on an individual rebalance (when kicked off manually). I am
interested in ideas on how best to parameterize for throttling and making
it as easy as possible for users to understand, configure, and predict
performance tradeoffs.

On Fri, Mar 9, 2018 at 6:52 AM, Anthony Baker <aba...@pivotal.io> wrote:

> It would be interesting to have a way to model the behavior of the current
> algorithm and compare it to the proposal under various conditions like
> membership changes, data imbalance, etc. That would let us understand the
> optimality of the change in a concrete way.
>
> Anthony
>
> > On Mar 9, 2018, at 1:19 AM, Swapnil Bawaskar <sbawas...@pivotal.io>
> wrote:
> >
> > Given that there is already support for parallel rebalancing among
> regions,
> > I do not see the value in supporting parallel rebalancing of buckets.
> >
> > If we end up doing this anyway, I would suggest to not rely on
> "parameters"
> > for throttling, as these parameters would have to be configured in
> advance
> > without knowing what the actual load looks like when rebalance is in
> > progress and hence could be difficult to get right. It would be ideal if
> we
> > can handle this using back pressure.
> >
> >> On Thu, Mar 8, 2018 at 12:05 PM Nick Reich <nre...@pivotal.io> wrote:
> >>
> >> Mike,
> >>
> >> I think having a good default value for maximum parallel operations will
> >> play a role in not consuming too many resources. Perhaps defaulting to
> only
> >> a single (or other small number based on testing) parallel action(s) per
> >> member at a time and allowing users that want better performance to
> >> increase that number would be a good start. That should result in
> >> performance improvements, but not place increased burden on any specific
> >> member. Especially when bootstrapping new members, relance speed may be
> >> more valuable than usual, so making it possible to configure on a per
> >> rebalance action level would be prefered.
> >>
> >> One clarification from my original proposal: regions can already be
> >> rebalanced in parallel, depending on the value of
> resource.manager.threads
> >> (which defaults to 1, so no parallelization or regions in the default
> >> case).
> >>
> >>> On Thu, Mar 8, 2018 at 11:46 AM, Michael Stolz <mst...@pivotal.io>
> wrote:
> >>>
> >>> We should be very careful about how much resource we dedicate to
> >>> rebalancing.
> >>>
> >>> One of our competitors rebalances *much* faster than we do, but in
> doing
> >> so
> >>> they consume all available resources.
> >>>
> >>> At one bank that caused significant loss of incoming market data that
> was
> >>> coming in on a multicast feed, which had a severe adverse effect on the
> >>> pricing and risk management functions for a period of time. That bank
> >>> removed the competitor's product and for several years no distributed
> >>> caching was allowed by the chief architect at that bank. Until he left
> >> and
> >>> a new chief architect was named they didn't use any distributed caching
> >>> products. When they DID go back to using them, it pre-dated Geode, so
> >> they
> >>> used GemFire largely because GemFire does not consume all available
> >>> resources while rebalancing.
> >>>
> >>> I do think we need to improve our rebalancing such that it iterates
> until
> >>> it achieves balance, but not in a way that will consume all available
> >>> resources.
> >>>
> >>> --
> >>> Mike Stolz
> >>>
> >>>
> >>>> On Thu, Mar 8, 2018 at 2:25 PM, Nick Reich <nre...@pivotal.io> wrote:
> >>>>
> >>>> Team,
> >>>>
> >>>> The time required to undertake a rebalance of a geode cluster has
> often
> >>>> been an area for improvement noted by users. Currently, buckets are
> >> moved
> >>>> one at a time and we propose that creating a system that moved buckets
> >> in
> >>>> parallel could greatly improve performance for this feature.
> >>>>
> >>>> Previously, parallelization was implemented for adding redundant
> copies
> >>> of
> >>>> buckets to restore redundancy. However, moving buckets is a more
> >>>> complicated matter and requires a different approach than restoration
> >> of
> >>>> redundancy. The reason for this is that members could be potentially
> >> both
> >>>> be gaining buckets and giving away buckets at the same time. While
> >> giving
> >>>> away a bucket, that member still has all of the data for the bucket,
> >>> until
> >>>> the receiving member has fully received the bucket and it can safely
> be
> >>>> removed from the original owner. This means that unless the member has
> >>> the
> >>>> memory overhead to store all of the buckets it will receive and all
> the
> >>>> buckets it started with, there is potential that parallel moving of
> >>> buckets
> >>>> could cause the member to run out of memory.
> >>>>
> >>>> For this reason, we propose a system that does (potentially) several
> >>> rounds
> >>>> of concurrent bucket moves:
> >>>> 1) A set of moves is calculated to improve balance that meet a
> >>> requirement
> >>>> that no member both receives and gives away a bucket (no member will
> >> have
> >>>> memory overhead of an existing bucket it is ultimately removing and a
> >> new
> >>>> bucket).
> >>>> 2) Conduct all calculated bucket moves in parallel. Parameters to
> >>> throttle
> >>>> this process (to prevent taking too many cluster resources, impacting
> >>>> performance) should be added, such as only allowing each member to
> >> either
> >>>> receive or send a maximum number of buckets concurrently.
> >>>> 3) If cluster is not yet balanced, perform additional iterations of
> >>>> calculating and conducting bucket moves, until balance is achieved or
> a
> >>>> possible maximum iterations is reached.
> >>>> Note: in both the existing and proposed system, regions are rebalanced
> >>> one
> >>>> at a time.
> >>>>
> >>>> Please let us know if you have feedback on this approach or additional
> >>>> ideas that should be considered.
> >>>>
> >>>
> >>
>

Reply via email to