Alexey Goncharuk created IGNITE-1239:
----------------------------------------

             Summary: Cache partition iterator throws exception when concurrent 
rebalancing is running
                 Key: IGNITE-1239
                 URL: https://issues.apache.org/jira/browse/IGNITE-1239
             Project: Ignite
          Issue Type: Bug
          Components: cache
            Reporter: Alexey Goncharuk


I observed this exception when IgniteRDD was iterating over partition and two 
new nodes have joined:
{code}
Caused by: class org.apache.ignite.IgniteCheckedException: Query execution 
failed: GridCacheQueryBean [qry=GridCacheQueryAdapter [type=SCAN, clsName=null, 
clause=null, 
filter=org.apache.ignite.internal.processors.cache.IgniteCacheProxy$1@6490c94c, 
part=138, incMeta=false, metrics=GridCacheQueryMetricsAdapter [minTime=10, 
maxTime=10, avgTime=10.0, execs=1, fails=1, executed=true], pageSize=1024, 
timeout=0, keepAll=true, incBackups=false, dedup=false, prj=null, 
keepPortable=false, subjId=9cdc9751-c6ec-43eb-968a-e941f2a1a8cd, taskHash=0], 
rdc=null, trans=null]
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.checkError(GridCacheQueryFutureAdapter.java:245)
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.internalIterator(GridCacheQueryFutureAdapter.java:303)
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.next(GridCacheQueryFutureAdapter.java:156)
        ... 17 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to execute 
query on node [query=GridCacheQueryBean [qry=GridCacheQueryAdapter [type=SCAN, 
clsName=null, clause=null, 
filter=org.apache.ignite.internal.processors.cache.IgniteCacheProxy$1@6490c94c, 
part=138, incMeta=false, metrics=GridCacheQueryMetricsAdapter [minTime=0, 
maxTime=0, avgTime=0.0, execs=0, fails=0, executed=false], pageSize=1024, 
timeout=0, keepAll=true, incBackups=false, dedup=false, prj=null, 
keepPortable=false, subjId=9cdc9751-c6ec-43eb-968a-e941f2a1a8cd, taskHash=0], 
rdc=null, trans=null], nodeId=963d0e35-7805-4b6d-8d64-22cce84e35f2]
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.onPage(GridCacheQueryFutureAdapter.java:370)
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.processQueryResponse(GridCacheDistributedQueryManager.java:377)
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.access$000(GridCacheDistributedQueryManager.java:44)
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$1.apply(GridCacheDistributedQueryManager.java:74)
        at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$1.apply(GridCacheDistributedQueryManager.java:72)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:534)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:240)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:48)
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1026)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2256)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:946)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:60)
        at 
org.apache.ignite.internal.managers.communication.GridIoManager$6.run(GridIoManager.java:915)
        ... 3 more
Caused by: class org.apache.ignite.IgniteCheckedException: Partition can't be 
reserved
        at 
org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:6808)
{code}

The issue is that query request was sent on a backup node and by the time 
request has arrived, the partition was already evicted, which resulted in 
"Partition cannot be reserved" exception. We should automatically retry if this 
exception is encountered.

I believe we have logic that retries, but it looks like there is a bug in that 
logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to