[
https://issues.apache.org/jira/browse/IGNITE-28623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18077683#comment-18077683
]
Dmitry Pavlov commented on IGNITE-28623:
----------------------------------------
Failed Test:
org.apache.ignite.testsuites.IgniteCacheDataStructuresSelfTestSuite:
org.apache.ignite.internal.processors.cache.datastructures.replicated.GridCacheReplicatedDataStructuresFailoverSelfTest.testAtomicSequenceConstantTopologyChange
- Likely Root Cause
This looks like a flaky topology/discovery test race, not PR-caused. The failed
test has no TeamCity assertion text, 0s duration, and an empty failure detail,
which fits a timeout/orphaned test report more than a deterministic code
assertion.
The suspicious local code is in
GridCacheAbstractDataStructuresFailoverSelfTest.java (line 1262): it creates an
atomic sequence on a client, then increments it while
ConstantTopologyChangeWorker starts nodes and, with circular=true, stops
G.allGrids().get(0) at line 1415 (line 1415). G.allGrids() comes from a
ConcurrentHashMap values iteration, so that “first” grid is not a stable or
semantically safe choice. Under churn, it can stop the client or the wrong
joining/server node.
This exact test family has old history as a hang source: IGNITE-8783 names
GridCacheReplicatedDataStructuresFailoverSelfTest#testAtomicSequenceConstantTopologyChange.
Local history also has fixes like IGNITE-9429 Fixed flaky
GridCacheReplicatedDataStructuresFailoverSelfTest.
Current Change?
Likely not. PR #13089 is Calcite-only: ClosableIteratorsHolder, RootNode, and
CancelTest; no data-structures, discovery, cache exchange, or atomic sequence
classes. The PR title is also SQL Calcite related: PR #13089, mirrored in the
Apache notification archive.
Minimal Fix
In ConstantTopologyChangeWorker, replace stopGrid(G.allGrids().get(0)...) with
deterministic selection of the oldest non-client server node, ideally by
discovery order, and stop that node. Inspect:
GridCacheAbstractDataStructuresFailoverSelfTest.java (line 1385)
GridCacheAtomicSequenceImpl.java (line 168)
DataStructuresProcessor.java (line 549)
ExchangeLatchManager.java (line 214)
Classification: flaky test/topology race, possibly exposed by TeamCity
timing/JDK/runtime conditions, not caused by the current Calcite change.
> Calcite engine. Query iterator can be prematurely closed, before returning
> result to the user
> ---------------------------------------------------------------------------------------------
>
> Key: IGNITE-28623
> URL: https://issues.apache.org/jira/browse/IGNITE-28623
> Project: Ignite
> Issue Type: Bug
> Reporter: Aleksey Plekhanov
> Assignee: Aleksey Plekhanov
> Priority: Major
> Labels: MakeTeamcityGreenAgain, calcite, ise
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Calcite engine uses reference tracking to determine if there are no more
> references to iterator and query can be closed (see
> {{ClosableIteratorsHolder}}), but sometimes query can be closed prematurely
> if there are already no references to iterator, but delegating iterator still
> process data.
> For example, calls like {{cache.query(qry).iterator().next()}} (create
> iterator, don't store it, use one delegate call) can fail in case of
> concurrent GC.
> This issue also reproduced in test:
> {{SqlPlanHistoryIntegrationTest.testDefaultHistorySize[sqlEngine=calcite,
> isClient=false loc=false, isFullyFetched=false, isPerfStatsEnabled=false]}}
> With error:
> {noformat}
> class org.apache.ignite.internal.processors.query.IgniteSQLException: An
> error occurred while query executing - The query was cancelled while
> executing.
> at
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.checkException(RootNode.java:325)
> at
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.exchangeBuffers(RootNode.java:314)
> at
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.hasNext(RootNode.java:213)
> at
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.next(RootNode.java:220)
> at
> org.apache.ignite.internal.processors.query.calcite.util.ConvertingClosableIterator.next(ConvertingClosableIterator.java:78)
> at
> org.apache.ignite.internal.processors.query.calcite.util.ConvertingClosableIterator.next(ConvertingClosableIterator.java:33)
> at
> org.apache.ignite.internal.processors.query.calcite.exec.ClosableIteratorsHolder$DelegatingIterator.next(ClosableIteratorsHolder.java:149)
> at
> org.apache.ignite.internal.processors.query.calcite.integration.SqlPlanHistoryIntegrationTest.cacheQuery(SqlPlanHistoryIntegrationTest.java:650)
> at
> org.apache.ignite.internal.processors.query.calcite.integration.SqlPlanHistoryIntegrationTest.checkDefaultHistorySize(SqlPlanHistoryIntegrationTest.java:857)
> at
> org.apache.ignite.internal.processors.query.calcite.integration.SqlPlanHistoryIntegrationTest.testDefaultHistorySize(SqlPlanHistoryIntegrationTest.java:504)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2486)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: class
> org.apache.ignite.internal.processors.query.IgniteSQLException: The query was
> cancelled while executing.
> at
> org.apache.ignite.internal.processors.query.calcite.exec.rel.RootNode.close(RootNode.java:119)
> at
> org.apache.ignite.internal.processors.query.calcite.util.Commons.close(Commons.java:337)
> at
> org.apache.ignite.internal.processors.query.calcite.util.ConvertingClosableIterator.close(ConvertingClosableIterator.java:95)
> at
> org.apache.ignite.internal.util.CommonUtils.close(CommonUtils.java:1000)
> at
> org.apache.ignite.internal.processors.query.calcite.util.Commons.close(Commons.java:345)
> at
> org.apache.ignite.internal.processors.query.calcite.exec.ClosableIteratorsHolder.cleanUp(ClosableIteratorsHolder.java:86)
> at
> org.apache.ignite.internal.processors.query.calcite.exec.ClosableIteratorsHolder.lambda$init$0(ClosableIteratorsHolder.java:71)
> ... 1 more
> Caused by: class org.apache.ignite.cache.query.QueryCancelledException: The
> query was cancelled while executing.
> ... 8 more
> {noformat}
> And fails on TC: https://ci2.ignite.apache.org/viewLog.html?buildId=9034083
--
This message was sent by Atlassian Jira
(v8.20.10#820010)