[ 
https://issues.apache.org/jira/browse/IGNITE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791719#comment-16791719
 ] 

Ignite TC Bot commented on IGNITE-10799:
----------------------------------------

{panel:title=--> Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3256705&buildTypeId=IgniteTests24Java8_RunAll]

> Optimize affinity initialization/re-calculation
> -----------------------------------------------
>
>                 Key: IGNITE-10799
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10799
>             Project: Ignite
>          Issue Type: Improvement
>          Components: cache
>    Affects Versions: 2.4
>            Reporter: Pavel Kovalenko
>            Assignee: Pavel Kovalenko
>            Priority: Major
>             Fix For: 2.8
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case of persistence enabled and a baseline is set we have 2 main 
> approaches to recalculate affinity:
> {noformat}
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerJoinWithExchangeMergeProtocol
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerLeftWithExchangeMergeProtocol
> {noformat}
> Both of them following the same approach of recalculating:
> 1) Take a current baseline (ideal assignment).
> 2) Filter out offline nodes from it.
> 3) Choose new primary nodes if previous went away.
> 4) Place temporal primary nodes to late affinity assignment set.
> Looking at implementation details we may notice that we do a lot of 
> unnecessary online nodes cache lookups and array list copies. The performance 
> becomes too slow if we do recalculate affinity for replicated caches (It 
> takes P * N on each node, where P - partitions count, N - the number of nodes 
> in the cluster). In case of large partitions count or large cluster, it may 
> take few seconds, which is unacceptable, because this process happens during 
> PME and freezes ongoing cluster operations.
> We should investigate possible bottlenecks and improve the performance of 
> affinity recalculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to