Reviewed, looks good, thanks for the fix. On Wed, Mar 25, 2015 at 9:24 PM, Alexey Goncharuk <[email protected]> wrote:
> Semyon, > > I was looking at one of the timed out tests and found this piece of thread > dump interesting: > > [20:08:23]Thread > [name="ignite-#16529%sys-near.GridCacheNearRemoveFailureTest0%", id=21488, > state=WAITING, blockCnt=1, waitCnt=11284] > [20:08:23] Lock > > [object=o.a.i.i.processors.affinity.GridAffinityAssignmentCache$AffinityReadyFuture@23931c53 > , > ownerName=null, ownerId=-1] > [20:08:23] at sun.misc.Unsafe.park(Native Method) > [20:08:23] at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > [20:08:23] at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > [20:08:23] at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > [20:08:23] at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > [20:08:23] at > o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:102) > [20:08:23] at > > o.a.i.i.processors.affinity.GridAffinityAssignmentCache.awaitTopologyVersion(GridAffinityAssignmentCache.java:400) > [20:08:23] at > > o.a.i.i.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:362) > [20:08:23] at > > o.a.i.i.processors.affinity.GridAffinityAssignmentCache.nodes(GridAffinityAssignmentCache.java:327) > [20:08:23] at > > o.a.i.i.processors.cache.GridCacheAffinityManager.nodes(GridCacheAffinityManager.java:187) > [20:08:23] at > > o.a.i.i.processors.cache.GridCacheAffinityManager.primary(GridCacheAffinityManager.java:205) > [20:08:23] at > > o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry.primaryNode(GridNearCacheEntry.java:630) > [20:08:23] at > > o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry.resetFromPrimary(GridNearCacheEntry.java:219) > [20:08:23] - locked > o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry@1333a4f6 > [20:08:23] at > > o.a.i.i.processors.cache.distributed.near.GridNearTxPrepareFuture$MiniFuture.onResult(GridNearTxPrepareFuture.java:935) > [20:08:23] at > > o.a.i.i.processors.cache.distributed.near.GridNearTxPrepareFuture.onResult(GridNearTxPrepareFuture.java:254) > [20:08:23] at > > o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareResponse(IgniteTxHandler.java:363) > [20:08:23] at > > o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:49) > [20:08:23] at > > o.a.i.i.processors.cache.transactions.IgniteTxHandler$2.apply(IgniteTxHandler.java:77) > [20:08:23] at > > o.a.i.i.processors.cache.transactions.IgniteTxHandler$2.apply(IgniteTxHandler.java:75) > [20:08:23] at > > o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:299) > [20:08:23] at > > o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:212) > [20:08:23] at > > o.a.i.i.processors.cache.GridCacheIoManager.access$300(GridCacheIoManager.java:44) > [20:08:23] at > > o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:132) > [20:08:23] at > > o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:664) > [20:08:23] at > > o.a.i.i.managers.communication.GridIoManager.access$1500(GridIoManager.java:57) > [20:08:23] at > o.a.i.i.managers.communication.GridIoManager$5.run(GridIoManager.java:627) > [20:08:23] at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [20:08:23] at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [20:08:23] at java.lang.Thread.run(Thread.java:745) > > This thread waits for new topology version to be ready, but it will not be > ready until update is completed. I analyzed all usages of primaryNode(UUID) > method and there is always a version of topology version available in the > context of the call. I added an argument to primaryNode(...) method and > propagated correct topology version there. Can you review my changes in > ignite-589? > > 2015-03-24 23:02 GMT-07:00 Semyon Boikov <[email protected]>: > > > Yes, this is possible, will implement this today. > > > > On Tue, Mar 24, 2015 at 6:38 PM, Dmitriy Setrakyan < > [email protected]> > > wrote: > > > > > I think we can do better than flushing near cache for every topology > > > version change. > > > > > > Let's say that that topology version in new cache entry is 1 and the > > actual > > > topology version is 4. Then we could check if the entry key changed > > > assigned between 1 and 4. For example, if the cache key primary node > > didn't > > > change on version 2, 3, and 4, then there is no point to flush the near > > > cache entry. > > > > > > Would this be possible to implement? > > > > > > D. > > > > > > On Tue, Mar 24, 2015 at 8:11 AM, Semyon Boikov <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > Today I investigated failures in failover suite and found issue with > > near > > > > cache update. Now when near cache entry is initialized we store > primary > > > > node id, and when value is requested from near cache entry we check > > that > > > > stored node is still primary (NearCacheEntry.valid()). > > > > Following scenario is possible (reproduces in our test): > > > > - there are two nodes A is primary, B is near > > > > - near cache entry is initialized on B, A is stored in near cache > entry > > > as > > > > primary > > > > - new node C joins grid and becomes new primary > > > > - values is updated from C, it is not aware about near reader B and > > value > > > > in near cache on B is not updated > > > > - node C leaves grid, A again becomes primary > > > > - value is requested from near cache entry on B, it sees that stored > > > node A > > > > is still primary and returns outdated value > > > > > > > > As a simple fix I changed GridNearCacheEntry to store current > topology > > > > version at the moment when entry was initialized from primary, and > > method > > > > NearCacheEntry.valid() checks that topology version did not change. > > > > Assuming topology should not change often this fix should not impact > > near > > > > cache performance. > > > > > > > > The only case when topology can change often is usage of client > nodes. > > > When > > > > support for client nodes will be fully implemented we will need some > > way > > > to > > > > check that cache affinity topology did not change. > > > > > > > > Thoughts? > > > > > > > > > >
