Maxim, I checked and it seems that send retry count is used only in cache IO manager and the usage is semantically very far from what I suggest. Resend count limits the attempts count, while I meant successfull send but possible problems on supplier side.
--Yakov 2018-07-17 19:01 GMT+03:00 Maxim Muzafarov <maxmu...@gmail.com>: > Yakov, > > But we already have DFLT_SEND_RETRY_CNT and DFLT_SEND_RETRY_DELAY for > configuring our CommunicationSPI behavior. What if user configure this > parameters his own way and he will see a lot of WARN messages in log which > have no sense? > > May be we use GridCachePartitionExchangeManager#forceRebalance (or may > be forceReassign) if we fail rebalance all that retries. What do you think? > > > > пн, 16 июл. 2018 г. в 21:12, Yakov Zhdanov <yzhda...@gridgain.com>: > > > Maxim, I looked at the code you provided. I think we need to add some > > timeout validation and output warning to logs on demander side in case > > there is no supply message within 30 secs and repeat demanding process. > > This should apply to any demand message throughout the rebalancing > process > > not only the 1st one. > > > > You can use the following message > > > > Failed to wait for supply message from node within 30 secs [cache=C, > > partId=XX] > > > > Alex Goncharuk do you have comments here? > > > > Yakov Zhdanov > > www.gridgain.com > > > > 2018-07-14 19:45 GMT+03:00 Maxim Muzafarov <maxmu...@gmail.com>: > > > > > Yakov, > > > > > > Yes, you're right. Whole rebalancing progress will be stopped. > > > > > > Actually, rebalancing order doesn't matter you right it too. Javadoc > just > > > says the idea how rebalance should work for caches but in fact it don't > > > work as described. Personally, I'd prefer to start rebalance of each > > cache > > > group in async way independently. > > > > > > Please, look at my reproducer [1]. > > > > > > Scenario: > > > Cluster with two REPLICATEDED caches. > > > Start new node. > > > First rebalance cache group is failed to start (e.g. network issues) - > > it's > > > OK. > > > Second rebalance cache group will neber be started - whole futher > > progress > > > stucks (I think rebalance here should be started!). > > > > > > > > > [1] > > > https://github.com/Mmuzaf/ignite/blob/rebalance-cancel/ > > > modules/core/src/test/java/org/apache/ignite/internal/ > > > processors/cache/distributed/rebalancing/ > GridCacheRebalancingCancelSelf > > > Test.java > > > > > > пт, 13 июл. 2018 г. в 17:46, Yakov Zhdanov <yzhda...@apache.org>: > > > > > > > Maxim, I do not understand the problem. Imagine I do not have any > > > ordering > > > > but rebalancing of some cache fails to start - so in my understanding > > > > overall rebalancing progress becomes blocked. Is that true? > > > > > > > > Can you pleaes provide reproducer for your problem? > > > > > > > > --Yakov > > > > > > > > 2018-07-09 16:42 GMT+03:00 Maxim Muzafarov <maxmu...@gmail.com>: > > > > > > > > > Hello Igniters, > > > > > > > > > > Each cache group has “rebalance order” property. As javadoc for > > > > > getRebalanceOrder() says: “Note that cache with order {@code 0} > does > > > not > > > > > participate in ordering. This means that cache with rebalance order > > > > {@code > > > > > 0} will never wait for any other caches. All caches with order > {@code > > > 0} > > > > > will be rebalanced right away concurrently with each other and > > ordered > > > > > rebalance processes. If not set, cache order is 0, i.e. rebalancing > > is > > > > not > > > > > ordered.” > > > > > > > > > > In fact GridCachePartitionExchangeManager always build the chain > of > > > > > rebalancing cache groups to start (even for cache order ZERO): > > > > > > > > > > ignite-sys-cache -> cacheR -> cacheR3 -> cacheR2 -> cacheR5 -> > > cacheR1. > > > > > > > > > > If one of these groups will fail to start further groups will never > > be > > > > run. > > > > > > > > > > * Question 1*: Should we fix javadoc description or create a bug > for > > > > fixing > > > > > such rebalance behavior? > > > > > > > > > > [1] > > > > > https://github.com/apache/ignite/blob/master/modules/ > > > > > core/src/main/java/org/apache/ignite/internal/processors/cache/ > > > > > GridCachePartitionExchangeManager.java#L2630 > > > > > > > > > > > > -- > > > -- > > > Maxim Muzafarov > > > > > > -- > -- > Maxim Muzafarov >