[
https://issues.apache.org/jira/browse/IGNITE-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465642#comment-16465642
]
Aleksey Plekhanov edited comment on IGNITE-8443 at 5/7/18 9:01 AM:
-------------------------------------------------------------------
Main reason of this behavior: transaction hangs when some error occurs during
processing of {{GridNearLockRequest}}. In {{testPessimisticTxPutAllMultinode}}
after rebalancing minor topology version changed, partition for primary key
changes state to {{RENTING}}. When we try to update data in this partition
exception is thrown, but response with error is not sending to transaction
initiating node.
Another simple reproducer for this case:
{code:java}
@Override protected IgniteConfiguration getConfiguration(final String
igniteInstanceName) throws Exception {
return super.getConfiguration(igniteInstanceName)
.setCacheConfiguration(
new CacheConfiguration()
.setName(DEFAULT_CACHE_NAME)
.setCacheMode(CacheMode.PARTITIONED)
.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
)
.setEventStorageSpi(
new NoopEventStorageSpi() {
@Override public void record(Event evt) throws
IgniteSpiException {
if (evt.type() == EVT_CACHE_ENTRY_CREATED &&
getTestIgniteInstanceIndex(igniteInstanceName) == 1)
throw new CacheException();
}
}
);
}
public void testTxFailure() throws Exception {
startGrids(2);
IgniteCache cache0 = grid(0).cache(DEFAULT_CACHE_NAME);
IgniteCache cache1 = grid(1).cache(DEFAULT_CACHE_NAME);
grid(0).transactions().txStart(TransactionConcurrency.PESSIMISTIC,
TransactionIsolation.REPEATABLE_READ);
cache0.put(primaryKey(cache1), 0);
}
{code}
was (Author: alex_pl):
Main reason of this behavior: transaction hangs when some error occurs during
processing of {{GridNearLockRequest}} (In {{testPessimisticTxPutAllMultinode}}
after rebalancing changing minor topology version and partition for primary key
change state to {{RENTING}} and exception is thrown when we try to update data
in this partition). Response with error is not sending to transaction
initiating node.
Another simple reproducer for this case:
{code:java}
@Override protected IgniteConfiguration getConfiguration(final String
igniteInstanceName) throws Exception {
return super.getConfiguration(igniteInstanceName)
.setCacheConfiguration(
new CacheConfiguration()
.setName(DEFAULT_CACHE_NAME)
.setCacheMode(CacheMode.PARTITIONED)
.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
)
.setEventStorageSpi(
new NoopEventStorageSpi() {
@Override public void record(Event evt) throws
IgniteSpiException {
if (evt.type() == EVT_CACHE_ENTRY_CREATED &&
getTestIgniteInstanceIndex(igniteInstanceName) == 1)
throw new CacheException();
}
}
);
}
public void testTxFailure() throws Exception {
startGrids(2);
IgniteCache cache0 = grid(0).cache(DEFAULT_CACHE_NAME);
IgniteCache cache1 = grid(1).cache(DEFAULT_CACHE_NAME);
grid(0).transactions().txStart(TransactionConcurrency.PESSIMISTIC,
TransactionIsolation.REPEATABLE_READ);
cache0.put(primaryKey(cache1), 0);
}
{code}
> Flaky failure of
> IgniteCacheClientNodeChangingTopologyTest.testPessimisticTxPutAllMultinode
> -------------------------------------------------------------------------------------------
>
> Key: IGNITE-8443
> URL: https://issues.apache.org/jira/browse/IGNITE-8443
> Project: Ignite
> Issue Type: Bug
> Reporter: Aleksey Plekhanov
> Assignee: Aleksey Plekhanov
> Priority: Major
> Labels: MakeTeamcityGreenAgain
>
> Test fails on TC sometimes (failure rate: 30%) with the following error:
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for update.
> at
> org.apache.ignite.internal.processors.cache.distributed.IgniteCacheClientNodeChangingTopologyTest.multinode(IgniteCacheClientNodeChangingTopologyTest.java:1855)
> at
> org.apache.ignite.internal.processors.cache.distributed.IgniteCacheClientNodeChangingTopologyTest.testPessimisticTxPutAllMultinode(IgniteCacheClientNodeChangingTopologyTest.java:1673)
> {noformat}
> Each time some seconds prior to failure there is error in log:
> {noformat}
> [ERROR][sys-stripe-10-#90529%distributed.IgniteCacheClientNodeChangingTopologyTest0%][GridDhtColocatedCache]
> <default> Failed to unmarshal at least one of the keys for lock request
> message: GridNearLockRequest [topVer=AffinityTopologyVersion [topVer=10,
> minorTopVer=0], miniId=1, dhtVers=[...],
> subjId=5ad87047-5d80-4530-bb48-f7c268400006, taskNameHash=0, createTtl=-1,
> accessTtl=-1, flags=6, filter=null, super=GridDistributedLockRequest
> [nodeId=5ad87047-5d80-4530-bb48-f7c268400006, nearXidVer=GridCacheVersion
> [topVer=136730132, order=1525250131532, nodeOrder=7], threadId=100107,
> futId=3e2912f2361-94bff164-8062-4fb4-8d85-c2e89e579148, timeout=0,
> isInTx=true, isInvalidate=false, isRead=false, isolation=REPEATABLE_READ,
> retVals=[...], txSize=0, flags=0, keysCnt=94,
> super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=136730132,
> order=1525250131532, nodeOrder=7], committedVers=null, rolledbackVers=null,
> cnt=0, super=GridCacheIdMessage [cacheId=1544803905]]]]
> class
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtInvalidPartitionException
> [part=54, msg=Adding entry to partition that is concurrently evicted
> [grp=default, part=54, shouldBeMoving=, belongs=true,
> topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0],
> curTopVer=AffinityTopologyVersion [topVer=10, minorTopVer=1]]]
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.localPartition0(GridDhtPartitionTopologyImpl.java:923)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.localPartition(GridDhtPartitionTopologyImpl.java:798)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.localPartition(GridCachePartitionedConcurrentMap.java:69)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.putEntryIfObsoleteOrAbsent(GridCachePartitionedConcurrentMap.java:88)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:955)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.entryEx(GridDhtCacheAdapter.java:525)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.entryExx(GridDhtCacheAdapter.java:545)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.lockAllAsync(GridDhtTransactionalCacheAdapter.java:987)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processNearLockRequest0(GridDhtTransactionalCacheAdapter.java:667)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$800(GridDhtTransactionalCacheAdapter.java:94)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$12$1.run(GridDhtTransactionalCacheAdapter.java:704)
> at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)