[
https://issues.apache.org/jira/browse/IGNITE-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758168#comment-16758168
]
Ivan Pavlukhin commented on IGNITE-11148:
-----------------------------------------
A flaw in compound {{PartitionCountersNeighborcastFuture}} initialization was
found. A response from remote node might have been received before mini future
was added to parent. As a result a mini future and therefore a parent future
was never completed.
> PartitionCountersNeighborcastFuture blocks partition map exchange
> ------------------------------------------------------------------
>
> Key: IGNITE-11148
> URL: https://issues.apache.org/jira/browse/IGNITE-11148
> Project: Ignite
> Issue Type: Bug
> Components: mvcc
> Reporter: Stepachev Maksim
> Assignee: Ivan Pavlukhin
> Priority: Major
> Labels: Faillover, Hanging, Transactions,
> mvcc_stabilization_stage_1
> Fix For: 2.8
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We researched a problem with "execution timeout" in Continuous Query 2 for
> *CacheContinuousQueryAsyncFailoverMvccTxSelfTest.testMultiThreadedFailover*.
> The investigation result showed that we got MVCC problem, as result the test
> blocks at *getAndPut*, because in some moment wrong behavior happened:
> {code:java}
> [16:02:56] : [Step 4/5] [2019-01-30 13:02:56,923][INFO
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][IgniteTxManager]
> Finishing prepared transaction [commit=false, tx=GridDhtTxRemote
> [nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b900004,
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4,
> nearXidVer=GridCacheVersion [topVer=160333378, order=1548853376060,
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter
> [explicitVers=null, started=true, commitAllowed=0,
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {},
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter
> [xidVer=GridCacheVersion [topVer=160333378, order=1548853376061,
> nodeOrder=3], writeVer=GridCacheVersion [topVer=160333378,
> order=1548853376062, nodeOrder=3], implicit=false, loc=false, threadId=21,
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c00002,
> startVer=GridCacheVersion [topVer=160333378, order=1548853376060,
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ,
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2,
> commitVer=GridCacheVersion [topVer=160333378, order=1548853376061,
> nodeOrder=3], finalizing=NONE, invalidParts=null, state=PREPARED,
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0],
> mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207,
> cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null,
> duration=191ms, onePhaseCommit=false]]]]{code}
> and after that:
> {code:java}
> [16:02:56] : [Step 4/5] [2019-01-30 13:02:56,931][INFO
> ][sys-stripe-6-#9%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][recovery]
> Starting delivery partition countres to remote nodes [txId=GridCacheVersion
> [topVer=160333378, order=1548853376060, nodeOrder=5],
> futId=82cfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4{code}
> _!IMPORTANT - we work with PartitionCountersNeighborcastFuture which *doesn't
> provide status information* (monitoring)._
> One of possible position of the problem:
> PartitionCountersNeighborcastFuture.onNodeLeft
> As result we have the transaction in *state=PREPARED* and *completionTime=0*
> which never complete :
>
> {code:java}
> [16:03:16]W: [org.apache.ignite:ignite-indexing] [2019-01-30
> 13:03:16,776][WARN
> ][exchange-worker-#40%continuous.CacheContinuousQueryAsyncFailoverMvccTxSelfTest0%][diagnostic]
> Failed to wait for partition release future [topVer=AffinityTopologyVersion
> [topVer=8, minorTopVer=0], node=18519119-475a-448f-8c02-ff1f64900000]
> LocalTxReleaseFuture [
> topVer=AffinityTopologyVersion [topVer=8, minorTopVer=0],
> futures=[
> TxFinishFuture [
> tx=GridDhtTxRemote [
> nearNodeId=6a8546ab-f09d-4b0c-91c1-5fcf5b900004,
> rmtFutId=95bfade9861-4f5107b4-70e5-44ef-96d4-1b18cd6b57e4,
> nearXidVer=GridCacheVersion [topVer=160333378, order=1548853376060,
> nodeOrder=5], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter
> [explicitVers=null, started=true, commitAllowed=0,
> txState=IgniteTxRemoteStateImpl [readMap=EmptyMap {},
> writeMap=ConcurrentLinkedHashMap {}], txLbl=null, super=IgniteTxAdapter [
> xidVer=GridCacheVersion [topVer=160333378, order=1548853376061,
> nodeOrder=3],
> writeVer=GridCacheVersion [topVer=160333378, order=1548853376062,
> nodeOrder=3], implicit=false, loc=false, threadId=21,
> startTime=1548853376731, nodeId=3e6881c0-1e96-42a9-8bd1-55d344c00002,
> startVer=GridCacheVersion [topVer=160333378, order=1548853376060,
> nodeOrder=1], endVer=null, isolation=REPEATABLE_READ,
> concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2,
> commitVer=GridCacheVersion [topVer=160333378, order=1548853376061,
> nodeOrder=3], finalizing=RECOVERY_FINISH, invalidParts=null, state=PREPARED,
> timedOut=false, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0],
> mvccSnapshot=MvccSnapshotWithoutTxs [crdVer=1548853371043, cntr=207,
> cleanupVer=204, opCntr=0], skipCompletedVers=false, parentTx=null,
> duration=20048ms, onePhaseCommit=false]]], completionTime=0, duration=20048]
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)