Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

Pavel Kovalenko Wed, 18 Apr 2018 10:48:11 -0700

Ivan,

I think your version is better, because it handles cases when several nodes
are left sequentially, so no needs to shrink baseline for each node left.
New version also saves some resources using internal scheduler.


2018-04-18 20:41 GMT+03:00 Ivan Rakov <ivan.glu...@gmail.com>:

> I can suggest an improvement to BaselineWatcher by Pavel. I've added a new
> version to https://issues.apache.org/jira/browse/IGNITE-8241 comments.
> Pavel, what do you think?
>
> Best Regards,
> Ivan Rakov
>
>
> On 17.04.2018 20:47, Denis Magda wrote:
>
>> Thanks, Pavel!
>>
>> Alexey, Ivan, could you check that there are no any pitfalls in the
>> example
>> and it can be used as a template for our users?
>> https://issues.apache.org/jira/secure/attachment/12919452/
>> BaselineWatcher.java
>>
>> --
>> Denis
>>
>> On Tue, Apr 17, 2018 at 10:40 AM, Pavel Kovalenko <jokse...@gmail.com>
>> wrote:
>>
>> Denis,
>>>
>>> I've attached example how to manage baseline automatically (It's named
>>> BaselineWatcher). It's just an concept and doesn't cover all possible
>>> cases, but might be good for a start.
>>>
>>> 2018-04-13 2:14 GMT+03:00 Denis Magda <dma...@apache.org>:
>>>
>>> Pavel, thanks for the suggestions. They would definitely work out. I
>>>>
>>> would
>>>
>>>> document the one with the event subscription:
>>>> https://issues.apache.org/jira/browse/IGNITE-8241
>>>>
>>>> Could you help preparing a sample code snippet with such a listener that
>>>> will be added to the doc? I know that there are some caveats related to
>>>>
>>> the
>>>
>>>> way how such an event has to be processed.
>>>>
>>>> Ivan, truly like your idea. Alex G., what's your thought on this?
>>>>
>>>> --
>>>> Denis
>>>>
>>>> On Thu, Apr 12, 2018 at 2:22 PM, Ivan Rakov <ivan.glu...@gmail.com>
>>>>
>>> wrote:
>>>
>>>> Guys,
>>>>>
>>>>> I also heard complaints about absence of option to automatically change
>>>>> baseline topology. They absolutely make sense.
>>>>> What Pavel suggested will work as a workaround. I think, in future
>>>>> releases we should give user an option to enable a similar behavior via
>>>>> Ignite Configuration.
>>>>> It may be called "Baseline Topology change policy". I see it as
>>>>>
>>>> rule-based
>>>>
>>>>> language, which allows to specify conditions of BLT change using
>>>>>
>>>> several
>>>
>>>> parameters - timeout and minimum allowed number of partition copies
>>>>>
>>>> left
>>>
>>>> (maybe this option should be provided also on per-cache-group level).
>>>>> Policy can also specify conditions for including new nodes in BLT if
>>>>>
>>>> they
>>>
>>>> are present - including node attributes filters and so on.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Best Regards,
>>>>> Ivan Rakov
>>>>>
>>>>>
>>>>> On 12.04.2018 19:41, Pavel Kovalenko wrote:
>>>>>
>>>>> Denis,
>>>>>>
>>>>>> It's just one of the ways to implement it. We also can subscribe on
>>>>>>
>>>>> node
>>>
>>>> join / fail events to properly track downtime of a node.
>>>>>>
>>>>>> 2018-04-12 19:38 GMT+03:00 Pavel Kovalenko <jokse...@gmail.com>:
>>>>>>
>>>>>> Denis,
>>>>>>
>>>>>>> Using our API we can implement this task as follows:
>>>>>>> Do each minute:
>>>>>>> 1) Get all alive server nodes consistent ids =>
>>>>>>> ignite().context().discovery().aliveServerNodes() =>
>>>>>>> mapToConsistentIds().
>>>>>>> 2) Get current baseline topology => ignite().cluster().
>>>>>>> currentBaselineTopology()
>>>>>>> 3) For each node in baseline and not in alive server nodes check
>>>>>>>
>>>>>> timeout
>>>>
>>>>> for this node.
>>>>>>> 4) If timeout is reached remove node from baseline
>>>>>>> 5) If baseline is changed set new baseline => ignite().cluster().
>>>>>>> setNewBaseline()
>>>>>>>
>>>>>>>
>>>>>>> 2018-04-12 2:18 GMT+03:00 Denis Magda <dma...@apache.org>:
>>>>>>>
>>>>>>> Pavel, Val,
>>>>>>>
>>>>>>>> So, it means that the rebalancing will be initiated only after an
>>>>>>>> administrator remove the failed node from the topology, right?
>>>>>>>>
>>>>>>>> Next, imagine that you are that IT administrator who has to automate
>>>>>>>>
>>>>>>> the
>>>>
>>>>> rebalancing activation if the node failed and not recovered within 1
>>>>>>>> minute. What would you do and what Ignite provides to fulfill the
>>>>>>>>
>>>>>>> task?
>>>>
>>>>> --
>>>>>>>> Denis
>>>>>>>>
>>>>>>>> On Wed, Apr 11, 2018 at 1:01 PM, Pavel Kovalenko <
>>>>>>>>
>>>>>>> jokse...@gmail.com>
>>>
>>>> wrote:
>>>>>>>>
>>>>>>>> Denis,
>>>>>>>>
>>>>>>>>> In case of incomplete baseline topology IgniteCache.rebalance()
>>>>>>>>>
>>>>>>>> will
>>>
>>>> do
>>>>
>>>>> nothing, because this event doesn't trigger partitions exchange or
>>>>>>>>>
>>>>>>>>> affinity
>>>>>>>>
>>>>>>>> change, so states of existing partitions are hold.
>>>>>>>>>
>>>>>>>>> 2018-04-11 22:27 GMT+03:00 Valentin Kulichenko <
>>>>>>>>> valentin.kuliche...@gmail.com>:
>>>>>>>>>
>>>>>>>>> Denis,
>>>>>>>>>
>>>>>>>>>> In my understanding, in this case you should remove node from BLT
>>>>>>>>>>
>>>>>>>>> and
>>>>
>>>>> that
>>>>>>>>>
>>>>>>>>> will trigger the rebalancing, no?
>>>>>>>>>>
>>>>>>>>>> -Val
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 11, 2018 at 12:23 PM, Denis Magda <
>>>>>>>>>>
>>>>>>>>> dma...@gridgain.com>
>>>
>>>> wrote:
>>>>>>>>>
>>>>>>>>> Igniters,
>>>>>>>>>>
>>>>>>>>>>> As we know the rebalancing doesn't happen if one of the nodes
>>>>>>>>>>>
>>>>>>>>>> goes
>>>
>>>> down,
>>>>>>>>>> thus, shrinking the baseline topology. It complies with our
>>>>>>>>>> assumption
>>>>>>>>>>
>>>>>>>>> that
>>>>>>>>>
>>>>>>>>>> the node should be recovered soon and there is no need to waste
>>>>>>>>>>> CPU/memory/networking resources of the cluster shifting the data
>>>>>>>>>>>
>>>>>>>>>>> around.
>>>>>>>>>> However, there are always edge cases. I was reasonably asked how
>>>>>>>>>>
>>>>>>>>> to
>>>
>>>> trigger
>>>>>>>>>>
>>>>>>>>>> the rebalancing within the baseline topology manually or on
>>>>>>>>>>>
>>>>>>>>>> timeout
>>>
>>>> if:
>>>>>>>>>>
>>>>>>>>>      - It's not expected that the failed node would be resurrected
>>>>>>>>>
>>>>>>>> in
>>>
>>>> the
>>>>>>>>>>
>>>>>>>>>      nearest time and
>>>>>>>>>
>>>>>>>>>>      - It's not likely that that node will be replaced by the
>>>>>>>>>>>
>>>>>>>>>> other
>>>
>>>> one.
>>>>>>>>>>
>>>>>>>>> The question. If I call IgniteCache.rebalance() or configure
>>>>>>>>>
>>>>>>>>>> CacheConfiguration.rebalanceTimeout will the rebalancing be
>>>>>>>>>>>
>>>>>>>>>> fired
>>>
>>>> within
>>>>>>>>>> the baseline topology?
>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Denis
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>

Re: Triggering rebalancing on timeout or manually if the baseline topology is not reassembled

Reply via email to