[jira] [Updated] (IGNITE-24016) [aimem] Restart node with open transaction leads to `Failed to get the primary replica` after node is started

Andrey Khitrin (Jira) Tue, 04 Feb 2025 02:49:57 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-24016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrey Khitrin updated IGNITE-24016:
------------------------------------
    Affects Version/s: 3.0

> [aimem] Restart node with open transaction leads to `Failed to get the 
> primary replica` after node is started
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-24016
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24016
>             Project: Ignite
>          Issue Type: Bug
>          Components: persistence
>    Affects Versions: 3.0, 3.0.0-beta1
>         Environment: 3 nodes (each node is CMG, each node 
> {color:#067d17}"{color}{color:#067d17}-Xms512m"{color}, 
> {color:#067d17}"-Xmx1536m{color}{color:#067d17}"{color}), each on separate 
> host. Each host vCPU: 4, Memory: 32GB.
>            Reporter: Igor
>            Priority: Major
>              Labels: ignite-3
>         Attachments: cluster logs.zip
>
>
> *Steps to reproduce:*
>  # Start 3 nodes (each node is CMG, each node 
> {color:#067d17}"-Xms512m"{color}, {color:#067d17}"-Xmx1536m"{color}), each on 
> separate host. Each host vCPU: 4, Memory: 32GB.
>  # Create 1 table
>  ## Execute query: create zone if not exists "cluster_failover_3" with 
> replicas=3, data_nodes_auto_adjust_scale_up=10, 
> data_nodes_auto_adjust_scale_down=10, storage_profiles='default_aimem'
>  ## Execute query: create TABLE failoverTest00(k1 INTEGER not null, k2 
> INTEGER not null, v1 VARCHAR(100), v2 VARCHAR(255), v3 TIMESTAMP not null, 
> primary key (k1, k2)) ZONE "cluster_failover_3"
>  # Fill table with 1000 rows each
>  # Await all partitions of all tables local state is "HEALTHY"
>  # Await all partitions of all tables global state is "AVAILABLE"
>  # Assert the tables has been filled, with expected row count of '1000' and 
> no errors in logs
>  # Assert that ignite log contains no errors or exceptions
>  # Kill node and start it again with opened transactions
>  # 
>  ## Begin IgniteSql transaction and execute insert
>  ## Begin KeyValueView transaction and execute insert
>  ## Begin JDBC transaction and execute insert
>  ## Kill node and start it again
>  # Wait node is ready after restart
>  # Assert physical and logical topologies are correct
>  # Assert the tables has been filled.
> *Expected:*
> All data present and consistent.
> *Actual:*
> Exception on step 11:
> {code:java}
> java.sql.SQLException: java.sql.SQLException: Failed to get the primary 
> replica [tablePartitionId=17_part_2]java.sql.SQLException: Failed to get the 
> primary replica [tablePartitionId=17_part_2]  at 
> org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
>   at 
> org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:160)
>   at 
> org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:115)
>   at 
> org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:113)
>   at 
> org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.tryGetActualResult(ClusterFailoverTestBase.java:340)
>   at 
> org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.lambda$getActualResult$7(ClusterFailoverTestBase.java:319)
>   at 
> org.gridgain.ai3tests.core.utils.RetryUtils.retryOnAllowedException(RetryUtils.java:61)
>   at 
> org.gridgain.ai3tests.core.utils.RetryUtils.retryOnAllowedException(RetryUtils.java:36)
>   at 
> org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.getActualResult(ClusterFailoverTestBase.java:318)
>   at 
> org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.assertDataIsFilledWithoutErrors(ClusterFailoverTestBase.java:179)
>   at 
> org.gridgain.ai3tests.tests.failover.ClusterFailover3NodesTest.singleKillAndRestartNodeWhenDataIsLoadedWithOpenTransactions(ClusterFailover3NodesTest.java:126)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:568)  at 
> io.qameta.allure.junit5.AllureJunit5.interceptTestTemplateMethod(AllureJunit5.java:59)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>   at java.base/java.lang.Thread.run(Thread.java:842) {code}
> [^cluster logs.zip]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-24016) [aimem] Restart node with open transaction leads to `Failed to get the primary replica` after node is started

Reply via email to