[ https://issues.apache.org/jira/browse/IGNITE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Ozerov updated IGNITE-6939: ------------------------------------ Component/s: sql cache > Exclude false owners from the execution plan based on query response > -------------------------------------------------------------------- > > Key: IGNITE-6939 > URL: https://issues.apache.org/jira/browse/IGNITE-6939 > Project: Ignite > Issue Type: Task > Security Level: Public(Viewable by anyone) > Components: cache, sql > Reporter: Alexey Goncharuk > Fix For: 2.4 > > > This is related to IGNITE-6858, the fix in the ticket can be improved. > The scenario leading to the issue is as follows: > 1) Node A has partition 1 as owning > 2) Node B has local partition map which has partition 1 on node A as owning > 3) Topology change is triggered which would move partition 1 from A to > another node, topology version is X > 4) A transaction is started on node B on topology X > 5) Partition is rebalanced and node A moves partition 1 to RENTING and then > to EVICTED state, node A updates it's local partition map. > 6) A new topology change is triggered > 7) Node A sends partition map (transitively) to the node B, but since there > is a pending exchange, node B ignores the updated map and still thinks that A > owns partition 1 [1] > 8) transaction attempts to execute an SQL query against partition 1 on node A > and retries infinitely > [1] The related code is in > GridDhtPartitionTopologyImpl#update(AffinityTopologyVersion, > GridDhtPartitionFullMap, CachePartitionFullCountersMap, Set, > AffinityTopologyVersion) > {code} > if (stopping || !lastTopChangeVer.initialized() || > // Ignore message not-related to exchange if exchange is in progress. > (exchangeVer == null && !lastTopChangeVer.equals(readyTopVer))) > return false; > {code} > There are two possibilities to fix this: > 1) Make all updates to partition map in a single thread, then we will not > need update sequences and then we can update local partition map even when > there is a pending exchange (this is a relatively big, but useful change) > 2) Make a change in SQL query execution so that if a node cannot reserve a > partition, do not map the partition to this node on the same topology version > anymore (a quick fix) > This will remove the need to throw an exception from SQL query inside > transaction when there is a pending exchange. -- This message was sent by Atlassian JIRA (v6.4.14#64029)