[
https://issues.apache.org/jira/browse/IGNITE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578176#comment-16578176
]
Anton Vinogradov edited comment on IGNITE-9053 at 8/13/18 12:05 PM:
--------------------------------------------------------------------
Acording to the hanged tests, we always have simultaneous nodes failure
{noformat}
[00:26:47]W: [org.apache.ignite:ignite-core] [2018-07-28
21:26:47,476][ERROR][sys-#97781%partitioned.GridCachePartitionedDataStructuresFailoverSelfTest1%][TcpCommunicationSpi]
Failed to send message to remote node [node=TcpDiscoveryNode
[id=6c4763cb-2ace-4afe-aa47-870a1dc00003 ...
[00:26:47]W: [org.apache.ignite:ignite-core] [2018-07-28
21:26:47,479][ERROR][sys-#97781%partitioned.GridCachePartitionedDataStructuresFailoverSelfTest1%][TcpCommunicationSpi]
Failed to send message to remote node [node=TcpDiscoveryNode
[id=471ab864-53f1-444f-a29b-dc522be00004 ...
{noformat}
Both nodes are informing via CQ.
Both received event, but failed before event processing.
So, we have 2 node failure events here.
During the first event processing we're closing first CQ (tx) fut inside
{{HighPriorityListener}} and waiting for second (all) fut inside
{{removeExplicitNodeLocks}}.
Second fut can't be done before first event processed.
was (Author: avinogradov):
Acording to the hanged tests, we always have simultaneous nodes failure
{{noformat}}
[00:26:47]W: [org.apache.ignite:ignite-core] [2018-07-28
21:26:47,476][ERROR][sys-#97781%partitioned.GridCachePartitionedDataStructuresFailoverSelfTest1%][TcpCommunicationSpi]
Failed to send message to remote node [node=TcpDiscoveryNode
[id=6c4763cb-2ace-4afe-aa47-870a1dc00003 ...
[00:26:47]W: [org.apache.ignite:ignite-core] [2018-07-28
21:26:47,479][ERROR][sys-#97781%partitioned.GridCachePartitionedDataStructuresFailoverSelfTest1%][TcpCommunicationSpi]
Failed to send message to remote node [node=TcpDiscoveryNode
[id=471ab864-53f1-444f-a29b-dc522be00004 ...
{{noformat}}
Both nodes are informing via CQ.
Both received event, but failed before event processing.
So, we have 2 node failure events here.
During the first event processing we're closing first CQ (tx) fut inside
{{HighPriorityListener}} and waiting for second (all) fut inside
{{removeExplicitNodeLocks}}.
Second fut can't be done before first event processed.
> testReentrantLockConstantTopologyChangeNonFailoverSafe can hang in case of
> broken tx
> ------------------------------------------------------------------------------------
>
> Key: IGNITE-9053
> URL: https://issues.apache.org/jira/browse/IGNITE-9053
> Project: Ignite
> Issue Type: Bug
> Components: data structures
> Affects Versions: 2.5
> Reporter: Anton Vinogradov
> Assignee: Anton Vinogradov
> Priority: Critical
> Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> -GridCachePartitionedDataStructuresFailoverSelfTest#testReentrantLockConstantTopologyChangeNonFailoverSafe
> -GridCachePartitionedDataStructuresFailoverSelfTest#testCountDownLatchConstantTopologyChange
>
> can hang in case of broken tx
> {noformat}
> Pending transactions:
> [2018-07-15 14:13:41,210][WARN
> ][exchange-worker-#1596354%partitioned.GridCachePartitionedDataStructuresFailoverSelfTest1%][diagnostic]
> >>> [txVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], exchWait=true,
> tx=GridDhtTxLocal [nearNodeId=1392b1bd-c807-4479-9bfe-fc9f70500000,
> nearFutId=14ffca0a461-999e75d0-a333-4bd6-a2a2-7f143d0af773, nearMiniId=1,
> nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion
> [topVer=143133203, order=1531653200153, nodeOrder=1],
> super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[],
> dhtNodes=[], explicitLock=false, super=IgniteTxLocalAdapter
> [completedBase=null, sndTransformedVals=false, depEnabled=false,
> txState=IgniteTxStateImpl [activeCacheIds=[1968300681], recovery=false,
> txMap=[IgniteTxEntry [key=KeyCacheObjectImpl [part=494,
> val=GridCacheInternalKeyImpl [name=structure,
> grpName=default-volatile-ds-group], hasValBytes=true], cacheId=1968300681,
> txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=494,
> val=GridCacheInternalKeyImpl [name=structure,
> grpName=default-volatile-ds-group], hasValBytes=true], cacheId=1968300681],
> val=[op=NOOP, val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP,
> val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1,
> conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
> filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=[],
> part=494, super=GridDistributedCacheEntry [super=GridCacheMapEntry
> [key=KeyCacheObjectImpl [part=494, val=GridCacheInternalKeyImpl
> [name=structure, grpName=default-volatile-ds-group], hasValBytes=true],
> val=CacheObjectImpl [val=null, hasValBytes=true], ver=GridCacheVersion
> [topVer=143133201, order=1531653200154, nodeOrder=2], hash=2095426867,
> extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc
> [locs=[GridCacheMvccCandidate [nodeId=1bf28b00-feed-412b-a20b-ca9fc1100001,
> ver=GridCacheVersion [topVer=143133203, order=1531653200157, nodeOrder=2],
> threadId=1947290, id=31143709, topVer=AffinityTopologyVersion [topVer=7,
> minorTopVer=0], reentry=null,
> otherNodeId=1392b1bd-c807-4479-9bfe-fc9f70500000, otherVer=GridCacheVersion
> [topVer=143133203, order=1531653200153, nodeOrder=1], mappedDhtNodes=null,
> mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl
> [part=494, val=GridCacheInternalKeyImpl [name=structure,
> grpName=default-volatile-ds-group], hasValBytes=true],
> masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0|read=0,
> prevVer=null, nextVer=null]], rmts=null]], flags=2]]], prepared=0,
> locked=false, nodeId=null, locMapped=false, expiryPlc=null,
> transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null,
> xidVer=GridCacheVersion [topVer=143133203, order=1531653200157,
> nodeOrder=2]]]], super=IgniteTxAdapter [xidVer=GridCacheVersion
> [topVer=143133203, order=1531653200157, nodeOrder=2], writeVer=null,
> implicit=false, loc=true, threadId=1947290, startTime=1531653200578,
> nodeId=1bf28b00-feed-412b-a20b-ca9fc1100001, startVer=GridCacheVersion
> [topVer=143133203, order=1531653200157, nodeOrder=2], endVer=null,
> isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0,
> sysInvalidate=false, sys=true, plc=2, commitVer=null, finalizing=NONE,
> invalidParts=null, state=ACTIVE, timedOut=false,
> topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], duration=20632ms,
> onePhaseCommit=false], size=1]]]]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)