Guys, It seems that both mentioned problem have the same root cause: each cache has personal affinity function instance and it leads to perfromance problem (we retry the same calcualtions for each cache) and problem related with fact that FailAffinityFunction is statefull (some co-located cache has different assignment if it was started on different topology).
Obvious solution is the same affinity for some cache set. As result all caches from one set will use the same assignment that will be calculated exactly once and will not depend on cache start topology. On Mon, Apr 10, 2017 at 4:05 PM, Sergi Vladykin <sergi.vlady...@gmail.com> wrote: > As for default value for useBalancer flag, I agree with Yakov, it must be > enabled by default. Because performance in steady state is usually more > important than performance of rebalancing. For edge cases it can be > disabled. > > Sergi > > 2017-04-10 15:04 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>: > >> If the RendezvousAffinity with enabled useBalancer is not much worse than >> FairAffinity, I see no reason to keep the latter. >> >> Sergi >> >> 2017-04-10 13:00 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: >> >>> Guys, >>> >>> We should not have it enabled by default because as Taras mentioned: "but >>> in this case there is not guarantee that a partition doesn't move from one >>> node to another when node leave topology". Let's avoid any rush here. >>> There >>> is nothing terribly wrong with FairAffinity. It is not enabled by default >>> and at the very least we can always mark it as deprecated. It is better to >>> test rigorously rendezvous affinity first in terms of partition >>> distribution and partition migration and decide whether results are >>> acceptable. >>> >>> On Mon, Apr 10, 2017 at 12:43 PM, Yakov Zhdanov <yzhda...@apache.org> >>> wrote: >>> >>> > We should have it enabled by default. >>> > >>> > --Yakov >>> > >>> > 2017-04-10 12:42 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>: >>> > >>> > > Why wouldn't we have useBalancer always enabled? >>> > > >>> > > Sergi >>> > > >>> > > 2017-04-10 12:31 GMT+03:00 Taras Ledkov <tled...@gridgain.com>: >>> > > >>> > > > Folks, >>> > > > >>> > > > I worked on issue https://issues.apache.org/jira/browse/IGNITE-3018 >>> > that >>> > > > is related to performance of Rendezvous AF. >>> > > > >>> > > > But Wang/Jenkins hash integer hash distribution is worse then MD5. >>> So, >>> > i >>> > > > try to use simple partition balancer close >>> > > > to Fair AF for Rendezvous AF. >>> > > > >>> > > > Take a look at the heatmaps of distributions at the issue. e.g.: >>> > > > - Compare of current Rendezvous AF and new Rendezvous AF based of >>> > > > Wang/Jenkins hash: https://issues.apache.org/jira >>> > > > /secure/attachment/12858701/004.png >>> > > > - Compare of current Rendezvous AF and new Rendezvous AF based of >>> > > > Wang/Jenkins hash with partition balancer: >>> > > https://issues.apache.org/jira >>> > > > /secure/attachment/12858690/balanced.004.png >>> > > > >>> > > > When the balancer is enabled the distribution of partitions by nodes >>> > > looks >>> > > > like close to even distribution >>> > > > but in this case there is not guarantee that a partition doesn't >>> move >>> > > from >>> > > > one node to another >>> > > > when node leave topology. >>> > > > It is not guarantee but we try to minimize it because sorted array >>> of >>> > > > nodes is used (like in for pure-Rendezvous AF). >>> > > > >>> > > > I think we can use new fast Rendezvous AF and use 'useBalancer' flag >>> > > > instead of Fair AF. >>> > > > >>> > > > On 09.04.2017 14:12, Valentin Kulichenko wrote: >>> > > > >>> > > >> What is the replacement for FairAffinityFunction? >>> > > >> >>> > > >> Generally I agree. If FairAffinityFunction can't be changed to >>> provide >>> > > >> consistent mapping, it should be dropped. >>> > > >> >>> > > >> -Val >>> > > >> >>> > > >> On Sun, Apr 9, 2017 at 3:50 AM, Sergi Vladykin < >>> > > sergi.vlady...@gmail.com >>> > > >> <mailto:sergi.vlady...@gmail.com>> wrote: >>> > > >> >>> > > >> Guys, >>> > > >> >>> > > >> It appeared that our FairAffinityFunction can assign the same >>> > > >> partitions to >>> > > >> different nodes for different caches. >>> > > >> >>> > > >> It basically means that there is no collocation between the >>> caches >>> > > >> at all >>> > > >> even if they have the same affinity. >>> > > >> >>> > > >> As a result all SQL joins will not work (even collocated ones), >>> > > other >>> > > >> operations that rely on cache collocation will be either >>> broken or >>> > > >> work >>> > > >> slower, than expected. >>> > > >> >>> > > >> All this stuff is really non-obvious. And I see no reason why >>> we >>> > > >> should >>> > > >> allow that. I suggest to prohibit this behavior and drop >>> > > >> FairAffinityFunction before 2.0. We have to clearly document >>> that >>> > > >> the same >>> > > >> affinity function must provide the same partition assignments >>> for >>> > > >> all the >>> > > >> caches. >>> > > >> >>> > > >> Also I know that Taras Ledkov was working on a decent stateless >>> > > >> replacement >>> > > >> for FairAffinity, so we should not loose anything here. >>> > > >> >>> > > >> Thoughts? >>> > > >> >>> > > >> Sergi >>> > > >> >>> > > >> >>> > > >> >>> > > > -- >>> > > > Taras Ledkov >>> > > > Mail-To: tled...@gridgain.com >>> > > > >>> > > > >>> > > >>> > >>> >> >>