[ 
https://issues.apache.org/jira/browse/IGNITE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6939:
------------------------------------
    Component/s: sql
                 cache

> Exclude false owners from the execution plan based on query response
> --------------------------------------------------------------------
>
>                 Key: IGNITE-6939
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6939
>             Project: Ignite
>          Issue Type: Task
>      Security Level: Public(Viewable by anyone) 
>          Components: cache, sql
>            Reporter: Alexey Goncharuk
>             Fix For: 2.4
>
>
> This is related to IGNITE-6858, the fix in the ticket can be improved.
> The scenario leading to the issue is as follows:
> 1) Node A has partition 1 as owning
> 2) Node B has local partition map which has partition 1 on node A as owning
> 3) Topology change is triggered which would move partition 1 from A to 
> another node, topology version is X
> 4) A transaction is started on node B on topology X
> 5) Partition is rebalanced and node A moves partition 1 to RENTING and then 
> to EVICTED state, node A updates it's local partition map.
> 6) A new topology change is triggered
> 7) Node A sends partition map (transitively) to the node B, but since there 
> is a pending exchange, node B ignores the updated map and still thinks that A 
> owns partition 1 [1]
> 8) transaction attempts to execute an SQL query against partition 1 on node A 
> and retries infinitely
> [1] The related code is in 
> GridDhtPartitionTopologyImpl#update(AffinityTopologyVersion, 
> GridDhtPartitionFullMap, CachePartitionFullCountersMap, Set, 
> AffinityTopologyVersion)
> {code}
> if (stopping || !lastTopChangeVer.initialized() ||
>     // Ignore message not-related to exchange if exchange is in progress.
>     (exchangeVer == null && !lastTopChangeVer.equals(readyTopVer)))
>     return false;
> {code}
> There are two possibilities to fix this:
> 1) Make all updates to partition map in a single thread, then we will not 
> need update sequences and then we can update local partition map even when 
> there is a pending exchange (this is a relatively big, but useful change)
> 2) Make a change in SQL query execution so that if a node cannot reserve a 
> partition, do not map the partition to this node on the same topology version 
> anymore (a quick fix)
> This will remove the need to throw an exception from SQL query inside 
> transaction when there is a pending exchange.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to