[
https://issues.apache.org/jira/browse/IGNITE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360722#comment-16360722
]
Alexey Goncharuk commented on IGNITE-6113:
------------------------------------------
Pavel,
1) The code around clearFuture looks suspicious to me: some of the
clearFuture.onDone() are sync-ed, some are not. Also, note that there is a
clearFuture.listen(), and the listener may be called either inside the sync
block, or outside (if the listener is invoked from another thread). In this
case, the reset() call is likely unsynchronized with the listener chain
invocation.
2) Partitions clear await is synchronous in system pool, we must avoid this. In
best case this will lead to a significant performance drop, in worst case - to
a deadlock. The wait should be asynchronous. We should probably also report
some sort of partition clear progress (or at least have a metric/mbean
indicating that rebalancing wont start because we are waiting for these
partitions).
3) There is a suspicious getter remaining() in GridDhtPartitionDemander - the
method is synchronized, but it returns a reference to a map. What if the map
changes afterwards?
4) Please add a specific test which will reproduce the absence of PME when
async eviction is happening. Also, we should add tests for the following
partition state transitions:
MOVING->RENTING->MOVING->OWNING (add an optional node crash for each transition)
RENTING->MOVING->RENTING->EVICTED (add an optional node crash for each
transition)
> Partition eviction prevents exchange from completion
> ----------------------------------------------------
>
> Key: IGNITE-6113
> URL: https://issues.apache.org/jira/browse/IGNITE-6113
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.1
> Reporter: Vladislav Pyatkov
> Assignee: Alexey Goncharuk
> Priority: Major
>
> I has waited for 3 hours for completion without any success.
> exchange-worker is blocked.
> {noformat}
> "exchange-worker-#92%DPL_GRID%grid554.ca.sbrf.ru%" #173 prio=5 os_prio=0
> tid=0x00007f0835c2e000 nid=0xb907 runnable [0x00007e74ab1d0000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007efee630a7c0> (a
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition$1)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189)
> at
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.assign(GridDhtPreloader.java:340)
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1801)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - None
> {noformat}
> {noformat}
> "sys-#124%DPL_GRID%grid554.ca.sbrf.ru%" #278 prio=5 os_prio=0
> tid=0x00007e731c02d000 nid=0xbf4d runnable [0x00007e734e7f7000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> at sun.nio.ch.IOUtil.write(IOUtil.java:51)
> at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
> - locked <0x00007f056161bf88> (a java.lang.Object)
> at
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.writeBuffer(FileWriteAheadLogManager.java:1829)
> at
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:1572)
> at
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:1421)
> at
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.access$800(FileWriteAheadLogManager.java:1331)
> at
> org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:339)
> at
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.beforeReleaseWrite(PageMemoryImpl.java:1287)
> at
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1142)
> at
> org.gridgain.grid.internal.processors.cache.database.pagemem.PageImpl.releaseWrite(PageImpl.java:167)
> at
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writeUnlock(PageHandler.java:193)
> at
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:242)
> at
> org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:119)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.doRemoveFromLeaf(BPlusTree.java:2886)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.removeFromLeaf(BPlusTree.java:2865)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.access$6900(BPlusTree.java:2515)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1607)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.doRemove(BPlusTree.java:1481)
> at
> org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.remove(BPlusTree.java:1451)
> at
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.remove(H2TreeIndex.java:307)
> at
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.doUpdate(GridH2Table.java:637)
> at
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:517)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:664)
> at
> org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:1186)
> at
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:467)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1090)
> at
> org.gridgain.grid.cache.db.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:993)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:357)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3621)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:599)
> - locked <0x00007f054d45bad8> (a
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCacheEntry)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:956)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:793)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:856)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:843)
> at
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6660)
> at
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:925)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)