[ https://issues.apache.org/jira/browse/IGNITE-28623 ]


    Dmitry Pavlov deleted comment on IGNITE-28623:
    ----------------------------------------

was (Author: dpavlov):
Likely Root Cause

This looks like a flaky topology/discovery test race, not PR-caused. The failed 
test has no TeamCity assertion text, 0s duration, and an empty failure detail, 
which fits a timeout/orphaned test report more than a deterministic code 
assertion.

The suspicious local code is in 
GridCacheAbstractDataStructuresFailoverSelfTest.java (line 1262): it creates an 
atomic sequence on a client, then increments it while 
ConstantTopologyChangeWorker starts nodes and, with circular=true, stops 
G.allGrids().get(0) at line 1415 (line 1415). G.allGrids() comes from a 
ConcurrentHashMap values iteration, so that “first” grid is not a stable or 
semantically safe choice. Under churn, it can stop the client or the wrong 
joining/server node.

This exact test family has old history as a hang source: IGNITE-8783 names 
GridCacheReplicatedDataStructuresFailoverSelfTest#testAtomicSequenceConstantTopologyChange.
 Local history also has fixes like IGNITE-9429 Fixed flaky 
GridCacheReplicatedDataStructuresFailoverSelfTest.

Current Change?

Likely not. PR #13089 is Calcite-only: ClosableIteratorsHolder, RootNode, and 
CancelTest; no data-structures, discovery, cache exchange, or atomic sequence 
classes. The PR title is also SQL Calcite related: PR #13089, mirrored in the 
Apache notification archive.

Minimal Fix

In ConstantTopologyChangeWorker, replace stopGrid(G.allGrids().get(0)...) with 
deterministic selection of the oldest non-client server node, ideally by 
discovery order, and stop that node. Inspect:


GridCacheAbstractDataStructuresFailoverSelfTest.java (line 1385)

GridCacheAtomicSequenceImpl.java (line 168)

DataStructuresProcessor.java (line 549)

ExchangeLatchManager.java (line 214)


Classification: flaky test/topology race, possibly exposed by TeamCity 
timing/JDK/runtime conditions, not caused by the current Calcite change.

> Calcite engine. Query iterator can be prematurely closed, before returning 
> result to the user
> ---------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-28623
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28623
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Aleksey Plekhanov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain, calcite, ise
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Calcite engine uses reference tracking to determine if there are no more 
> references to iterator and query can be closed (see 
> {{ClosableIteratorsHolder}}), but sometimes query can be closed prematurely 
> if there are already no references to iterator, but delegating iterator still 
> process data.
> For example, calls like {{cache.query(qry).iterator().next()}} (create 
> iterator, don't store it, use one delegate call) can fail in case of 
> concurrent GC.
> This issue also reproduced in test: 
> {{SqlPlanHistoryIntegrationTest.testDefaultHistorySize[sqlEngine=calcite, 
> isClient=false loc=false, isFullyFetched=false, isPerfStatsEnabled=false]}}
> With error:
> {noformat}
> class org.apache.ignite.internal.processors.query.IgniteSQLException: An 
> error occurred while query executing - The query was cancelled while 
> executing.
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.checkException(RootNode.java:325)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.exchangeBuffers(RootNode.java:314)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.hasNext(RootNode.java:213)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.next(RootNode.java:220)
>       at 
> org.apache.ignite.internal.processors.query.calcite.util.ConvertingClosableIterator.next(ConvertingClosableIterator.java:78)
>       at 
> org.apache.ignite.internal.processors.query.calcite.util.ConvertingClosableIterator.next(ConvertingClosableIterator.java:33)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.ClosableIteratorsHolder$DelegatingIterator.next(ClosableIteratorsHolder.java:149)
>       at 
> org.apache.ignite.internal.processors.query.calcite.integration.SqlPlanHistoryIntegrationTest.cacheQuery(SqlPlanHistoryIntegrationTest.java:650)
>       at 
> org.apache.ignite.internal.processors.query.calcite.integration.SqlPlanHistoryIntegrationTest.checkDefaultHistorySize(SqlPlanHistoryIntegrationTest.java:857)
>       at 
> org.apache.ignite.internal.processors.query.calcite.integration.SqlPlanHistoryIntegrationTest.testDefaultHistorySize(SqlPlanHistoryIntegrationTest.java:504)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2486)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: class 
> org.apache.ignite.internal.processors.query.IgniteSQLException: The query was 
> cancelled while executing.
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.close(RootNode.java:119)
>       at 
> org.apache.ignite.internal.processors.query.calcite.util.Commons.close(Commons.java:337)
>       at 
> org.apache.ignite.internal.processors.query.calcite.util.ConvertingClosableIterator.close(ConvertingClosableIterator.java:95)
>       at 
> org.apache.ignite.internal.util.CommonUtils.close(CommonUtils.java:1000)
>       at 
> org.apache.ignite.internal.processors.query.calcite.util.Commons.close(Commons.java:345)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.ClosableIteratorsHolder.cleanUp(ClosableIteratorsHolder.java:86)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.ClosableIteratorsHolder.lambda$init$0(ClosableIteratorsHolder.java:71)
>       ... 1 more
> Caused by: class org.apache.ignite.cache.query.QueryCancelledException: The 
> query was cancelled while executing.
>       ... 8 more
> {noformat}
>  And fails on TC: https://ci2.ignite.apache.org/viewLog.html?buildId=9034083



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to