Semyon, I was looking at one of the timed out tests and found this piece of thread dump interesting:
[20:08:23]Thread [name="ignite-#16529%sys-near.GridCacheNearRemoveFailureTest0%", id=21488, state=WAITING, blockCnt=1, waitCnt=11284] [20:08:23] Lock [object=o.a.i.i.processors.affinity.GridAffinityAssignmentCache$AffinityReadyFuture@23931c53, ownerName=null, ownerId=-1] [20:08:23] at sun.misc.Unsafe.park(Native Method) [20:08:23] at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) [20:08:23] at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) [20:08:23] at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) [20:08:23] at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) [20:08:23] at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:102) [20:08:23] at o.a.i.i.processors.affinity.GridAffinityAssignmentCache.awaitTopologyVersion(GridAffinityAssignmentCache.java:400) [20:08:23] at o.a.i.i.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:362) [20:08:23] at o.a.i.i.processors.affinity.GridAffinityAssignmentCache.nodes(GridAffinityAssignmentCache.java:327) [20:08:23] at o.a.i.i.processors.cache.GridCacheAffinityManager.nodes(GridCacheAffinityManager.java:187) [20:08:23] at o.a.i.i.processors.cache.GridCacheAffinityManager.primary(GridCacheAffinityManager.java:205) [20:08:23] at o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry.primaryNode(GridNearCacheEntry.java:630) [20:08:23] at o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry.resetFromPrimary(GridNearCacheEntry.java:219) [20:08:23] - locked o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry@1333a4f6 [20:08:23] at o.a.i.i.processors.cache.distributed.near.GridNearTxPrepareFuture$MiniFuture.onResult(GridNearTxPrepareFuture.java:935) [20:08:23] at o.a.i.i.processors.cache.distributed.near.GridNearTxPrepareFuture.onResult(GridNearTxPrepareFuture.java:254) [20:08:23] at o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareResponse(IgniteTxHandler.java:363) [20:08:23] at o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:49) [20:08:23] at o.a.i.i.processors.cache.transactions.IgniteTxHandler$2.apply(IgniteTxHandler.java:77) [20:08:23] at o.a.i.i.processors.cache.transactions.IgniteTxHandler$2.apply(IgniteTxHandler.java:75) [20:08:23] at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:299) [20:08:23] at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:212) [20:08:23] at o.a.i.i.processors.cache.GridCacheIoManager.access$300(GridCacheIoManager.java:44) [20:08:23] at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:132) [20:08:23] at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:664) [20:08:23] at o.a.i.i.managers.communication.GridIoManager.access$1500(GridIoManager.java:57) [20:08:23] at o.a.i.i.managers.communication.GridIoManager$5.run(GridIoManager.java:627) [20:08:23] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [20:08:23] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [20:08:23] at java.lang.Thread.run(Thread.java:745) This thread waits for new topology version to be ready, but it will not be ready until update is completed. I analyzed all usages of primaryNode(UUID) method and there is always a version of topology version available in the context of the call. I added an argument to primaryNode(...) method and propagated correct topology version there. Can you review my changes in ignite-589? 2015-03-24 23:02 GMT-07:00 Semyon Boikov <[email protected]>: > Yes, this is possible, will implement this today. > > On Tue, Mar 24, 2015 at 6:38 PM, Dmitriy Setrakyan <[email protected]> > wrote: > > > I think we can do better than flushing near cache for every topology > > version change. > > > > Let's say that that topology version in new cache entry is 1 and the > actual > > topology version is 4. Then we could check if the entry key changed > > assigned between 1 and 4. For example, if the cache key primary node > didn't > > change on version 2, 3, and 4, then there is no point to flush the near > > cache entry. > > > > Would this be possible to implement? > > > > D. > > > > On Tue, Mar 24, 2015 at 8:11 AM, Semyon Boikov <[email protected]> > > wrote: > > > > > Hi, > > > > > > Today I investigated failures in failover suite and found issue with > near > > > cache update. Now when near cache entry is initialized we store primary > > > node id, and when value is requested from near cache entry we check > that > > > stored node is still primary (NearCacheEntry.valid()). > > > Following scenario is possible (reproduces in our test): > > > - there are two nodes A is primary, B is near > > > - near cache entry is initialized on B, A is stored in near cache entry > > as > > > primary > > > - new node C joins grid and becomes new primary > > > - values is updated from C, it is not aware about near reader B and > value > > > in near cache on B is not updated > > > - node C leaves grid, A again becomes primary > > > - value is requested from near cache entry on B, it sees that stored > > node A > > > is still primary and returns outdated value > > > > > > As a simple fix I changed GridNearCacheEntry to store current topology > > > version at the moment when entry was initialized from primary, and > method > > > NearCacheEntry.valid() checks that topology version did not change. > > > Assuming topology should not change often this fix should not impact > near > > > cache performance. > > > > > > The only case when topology can change often is usage of client nodes. > > When > > > support for client nodes will be fully implemented we will need some > way > > to > > > check that cache affinity topology did not change. > > > > > > Thoughts? > > > > > >
