I updated the issue [1] with the table of the average count of migrated
primary partitions when one node leaves.
[1].
https://issues.apache.org/jira/browse/IGNITE-3018?focusedCommentId=15963015&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15963015
On 10.04.2017 18:00, Sergi Vladykin wrote:
Absolutely agree, lets get some numbers on RendezvousAffinity with both
variants: useBalancer enabled and disabled. Taras, can you provide them?
Anyways at the moment we need to make a decision on what will get into 2.0.
I'm for dropping (or hiding) all the suspicious stuff and adding it back if
we fix it. Thus I'm going to move FairAffinity into private package now.
Sergi
2017-04-10 16:55 GMT+03:00 Vladimir Ozerov <[email protected]>:
Sergi,
AFAIK the only reason why RendezvousAffinity is used by default is that
behavior on rebalance is no less important than steady state performance,
especially on large deployments and cloud environments, when nodes
constantly joins and leaves topology. Let's stop guessing and discuss the
numbers - how many partitions reassignments happen with new
RendezvousAffinity flavor? I haven't seen any results so far.
On Mon, Apr 10, 2017 at 4:39 PM, Andrey Gura <[email protected]> wrote:
Guys,
It seems that both mentioned problem have the same root cause: each
cache has personal affinity function instance and it leads to
perfromance problem (we retry the same calcualtions for each cache)
and problem related with fact that FailAffinityFunction is statefull
(some co-located cache has different assignment if it was started on
different topology).
Obvious solution is the same affinity for some cache set. As result
all caches from one set will use the same assignment that will be
calculated exactly once and will not depend on cache start topology.
On Mon, Apr 10, 2017 at 4:05 PM, Sergi Vladykin
<[email protected]> wrote:
As for default value for useBalancer flag, I agree with Yakov, it must
be
enabled by default. Because performance in steady state is usually more
important than performance of rebalancing. For edge cases it can be
disabled.
Sergi
2017-04-10 15:04 GMT+03:00 Sergi Vladykin <[email protected]>:
If the RendezvousAffinity with enabled useBalancer is not much worse
than
FairAffinity, I see no reason to keep the latter.
Sergi
2017-04-10 13:00 GMT+03:00 Vladimir Ozerov <[email protected]>:
Guys,
We should not have it enabled by default because as Taras mentioned:
"but
in this case there is not guarantee that a partition doesn't move
from
one
node to another when node leave topology". Let's avoid any rush here.
There
is nothing terribly wrong with FairAffinity. It is not enabled by
default
and at the very least we can always mark it as deprecated. It is
better to
test rigorously rendezvous affinity first in terms of partition
distribution and partition migration and decide whether results are
acceptable.
On Mon, Apr 10, 2017 at 12:43 PM, Yakov Zhdanov <[email protected]
wrote:
We should have it enabled by default.
--Yakov
2017-04-10 12:42 GMT+03:00 Sergi Vladykin <
[email protected]
:
Why wouldn't we have useBalancer always enabled?
Sergi
2017-04-10 12:31 GMT+03:00 Taras Ledkov <[email protected]>:
Folks,
I worked on issue https://issues.apache.org/
jira/browse/IGNITE-3018
that
is related to performance of Rendezvous AF.
But Wang/Jenkins hash integer hash distribution is worse then
MD5.
So,
i
try to use simple partition balancer close
to Fair AF for Rendezvous AF.
Take a look at the heatmaps of distributions at the issue.
e.g.:
- Compare of current Rendezvous AF and new Rendezvous AF based
of
Wang/Jenkins hash: https://issues.apache.org/jira
/secure/attachment/12858701/004.png
- Compare of current Rendezvous AF and new Rendezvous AF based
of
Wang/Jenkins hash with partition balancer:
https://issues.apache.org/jira
/secure/attachment/12858690/balanced.004.png
When the balancer is enabled the distribution of partitions by
nodes
looks
like close to even distribution
but in this case there is not guarantee that a partition
doesn't
move
from
one node to another
when node leave topology.
It is not guarantee but we try to minimize it because sorted
array
of
nodes is used (like in for pure-Rendezvous AF).
I think we can use new fast Rendezvous AF and use 'useBalancer'
flag
instead of Fair AF.
On 09.04.2017 14:12, Valentin Kulichenko wrote:
What is the replacement for FairAffinityFunction?
Generally I agree. If FairAffinityFunction can't be changed to
provide
consistent mapping, it should be dropped.
-Val
On Sun, Apr 9, 2017 at 3:50 AM, Sergi Vladykin <
[email protected]
<mailto:[email protected]>> wrote:
Guys,
It appeared that our FairAffinityFunction can assign the
same
partitions to
different nodes for different caches.
It basically means that there is no collocation between
the
caches
at all
even if they have the same affinity.
As a result all SQL joins will not work (even collocated
ones),
other
operations that rely on cache collocation will be either
broken or
work
slower, than expected.
All this stuff is really non-obvious. And I see no reason
why
we
should
allow that. I suggest to prohibit this behavior and drop
FairAffinityFunction before 2.0. We have to clearly
document
that
the same
affinity function must provide the same partition
assignments
for
all the
caches.
Also I know that Taras Ledkov was working on a decent
stateless
replacement
for FairAffinity, so we should not loose anything here.
Thoughts?
Sergi
--
Taras Ledkov
Mail-To: [email protected]
--
Taras Ledkov
Mail-To: [email protected]