Semyon,

I was looking at one of the timed out tests and found this piece of thread
dump interesting:

[20:08:23]Thread
[name="ignite-#16529%sys-near.GridCacheNearRemoveFailureTest0%", id=21488,
state=WAITING, blockCnt=1, waitCnt=11284]
[20:08:23]    Lock
[object=o.a.i.i.processors.affinity.GridAffinityAssignmentCache$AffinityReadyFuture@23931c53,
ownerName=null, ownerId=-1]
[20:08:23]        at sun.misc.Unsafe.park(Native Method)
[20:08:23]        at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[20:08:23]        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
[20:08:23]        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
[20:08:23]        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
[20:08:23]        at
o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:102)
[20:08:23]        at
o.a.i.i.processors.affinity.GridAffinityAssignmentCache.awaitTopologyVersion(GridAffinityAssignmentCache.java:400)
[20:08:23]        at
o.a.i.i.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:362)
[20:08:23]        at
o.a.i.i.processors.affinity.GridAffinityAssignmentCache.nodes(GridAffinityAssignmentCache.java:327)
[20:08:23]        at
o.a.i.i.processors.cache.GridCacheAffinityManager.nodes(GridCacheAffinityManager.java:187)
[20:08:23]        at
o.a.i.i.processors.cache.GridCacheAffinityManager.primary(GridCacheAffinityManager.java:205)
[20:08:23]        at
o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry.primaryNode(GridNearCacheEntry.java:630)
[20:08:23]        at
o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry.resetFromPrimary(GridNearCacheEntry.java:219)
[20:08:23]        - locked
o.a.i.i.processors.cache.distributed.near.GridNearCacheEntry@1333a4f6
[20:08:23]        at
o.a.i.i.processors.cache.distributed.near.GridNearTxPrepareFuture$MiniFuture.onResult(GridNearTxPrepareFuture.java:935)
[20:08:23]        at
o.a.i.i.processors.cache.distributed.near.GridNearTxPrepareFuture.onResult(GridNearTxPrepareFuture.java:254)
[20:08:23]        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareResponse(IgniteTxHandler.java:363)
[20:08:23]        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:49)
[20:08:23]        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler$2.apply(IgniteTxHandler.java:77)
[20:08:23]        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler$2.apply(IgniteTxHandler.java:75)
[20:08:23]        at
o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:299)
[20:08:23]        at
o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:212)
[20:08:23]        at
o.a.i.i.processors.cache.GridCacheIoManager.access$300(GridCacheIoManager.java:44)
[20:08:23]        at
o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:132)
[20:08:23]        at
o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:664)
[20:08:23]        at
o.a.i.i.managers.communication.GridIoManager.access$1500(GridIoManager.java:57)
[20:08:23]        at
o.a.i.i.managers.communication.GridIoManager$5.run(GridIoManager.java:627)
[20:08:23]        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[20:08:23]        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[20:08:23]        at java.lang.Thread.run(Thread.java:745)

This thread waits for new topology version to be ready, but it will not be
ready until update is completed. I analyzed all usages of primaryNode(UUID)
method and there is always a version of topology version available in the
context of the call. I added an argument to primaryNode(...) method and
propagated correct topology version there. Can you review my changes in
ignite-589?

2015-03-24 23:02 GMT-07:00 Semyon Boikov <[email protected]>:

> Yes, this is possible, will implement this today.
>
> On Tue, Mar 24, 2015 at 6:38 PM, Dmitriy Setrakyan <[email protected]>
> wrote:
>
> > I think we can do better than flushing near cache for every topology
> > version change.
> >
> > Let's say that that topology version in new cache entry is 1 and the
> actual
> > topology version is 4. Then we could check if the entry key changed
> > assigned between 1 and 4. For example, if the cache key primary node
> didn't
> > change on version 2, 3, and 4, then there is no point to flush the near
> > cache entry.
> >
> > Would this be possible to implement?
> >
> > D.
> >
> > On Tue, Mar 24, 2015 at 8:11 AM, Semyon Boikov <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > Today I investigated failures in failover suite and found issue with
> near
> > > cache update. Now when near cache entry is initialized we store primary
> > > node id, and when value is requested from near cache entry we check
> that
> > > stored node is still primary (NearCacheEntry.valid()).
> > > Following scenario is possible (reproduces in our test):
> > > - there are two nodes A is primary, B is near
> > > - near cache entry is initialized on B, A is stored in near cache entry
> > as
> > > primary
> > > - new node C joins grid and becomes new primary
> > > - values is updated from C, it is not aware about near reader B and
> value
> > > in near cache on B is not updated
> > > - node C leaves grid, A again becomes primary
> > > - value is requested from near cache entry on B, it sees that stored
> > node A
> > > is still primary and returns outdated value
> > >
> > > As a simple fix I changed GridNearCacheEntry to store current topology
> > > version at the moment when entry was initialized from primary, and
> method
> > > NearCacheEntry.valid() checks that topology version did not change.
> > > Assuming topology should not change often this fix should not impact
> near
> > > cache performance.
> > >
> > > The only case when topology can change often is usage of client nodes.
> > When
> > > support for client nodes will be fully implemented we will need some
> way
> > to
> > > check that cache affinity topology did not change.
> > >
> > > Thoughts?
> > >
> >
>

Reply via email to