[
https://issues.apache.org/jira/browse/IGNITE-26168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018676#comment-18018676
]
Ignite TC Bot commented on IGNITE-26168:
----------------------------------------
{panel:title=Branch: [pull/12288/head] Base: [master] : No blockers
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/12288/head] Base: [master] : New Tests
(25)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}Calcite SQL{color} [[tests
3|https://ci2.ignite.apache.org/viewLog.html?buildId=8589487]]
* {color:#013220}subquery/any_all.test_scalar_any_all.test_ignore -
PASSED{color}
*
{color:#013220}org.apache.ignite.internal.processors.query.calcite.integration.SqlPlanHistoryIntegrationTest.testSqlFieldsCrossCacheQuery[sqlEngine=calcite,
isClient=false loc=false, isFullyFetched=false, isPerfStatsEnabled=false] -
PASSED{color}
*
{color:#013220}org.apache.ignite.internal.processors.query.calcite.exec.rel.SortAggregateExecutionTest.avg[Task
executor = QUERY_BLOCKING, Execution strategy = FIFO, type=MAP_REDUCE] -
PASSED{color}
{color:#00008b}Queries 3 (lazy=true){color} [[tests
2|https://ci2.ignite.apache.org/viewLog.html?buildId=8586749]]
*
{color:#013220}org.apache.ignite.internal.processors.query.IgniteSqlDefaultSchemaTest.testBasicOpsImplicitPublicSchema
- PASSED{color}
*
{color:#013220}org.apache.ignite.internal.processors.query.IgniteSqlNotNullConstraintTest.testTxInvokeAllDelete
- PASSED{color}
{color:#00008b}JDBC Driver{color} [[tests
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586560]]
*
{color:#013220}org.apache.ignite.jdbc.thin.JdbcThinStreamingOrderedSelfTest.testCustomObject
- PASSED{color}
{color:#00008b}Cache 7{color} [[tests
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586523]]
*
{color:#013220}org.apache.ignite.internal.processors.cache.IgniteDynamicCacheStartFailWithPersistenceTest.testTopologyChangesAfterFailure
- PASSED{color}
{color:#00008b}Queries 1 (lazy=true){color} [[tests
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586745]]
*
{color:#013220}org.apache.ignite.internal.processors.cache.IgniteCacheSqlDmlErrorSelfTest.testUpdateMixingValueAndValueFields
- PASSED{color}
{color:#00008b}Queries 1{color} [[tests
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586582]]
*
{color:#013220}org.apache.ignite.internal.processors.cache.index.DynamicIndexClientBasicSelfTest.testDropNoIndexPartitionedTransactional
- PASSED{color}
{color:#00008b}Cache 12{color} [[tests
8|https://ci2.ignite.apache.org/viewLog.html?buildId=8586516]]
* {color:#013220}IgniteCacheTestSuite12:
IgniteLostPartitionsRecoveryTest.testNodeJoinWithStaleCacheGroupRecoveryData -
PASSED{color}
* {color:#013220}IgniteCacheTestSuite12:
IgniteLostPartitionsRecoveryTest.testNodeJoinDuringClusterStateTransition -
PASSED{color}
* {color:#013220}IgniteCacheTestSuite12:
IgniteLostPartitionsRecoveryTest.testCoordinatorWithMissingCacheGroupRecoveryData
- PASSED{color}
* {color:#013220}IgniteCacheTestSuite12:
IgniteLostPartitionsRecoveryTest.testNodeJoinAfterActivation - PASSED{color}
* {color:#013220}IgniteCacheTestSuite12:
IgniteLostPartitionsRecoveryTest.testPartitionLossDetectionOnActivation -
PASSED{color}
* {color:#013220}IgniteCacheTestSuite12:
IgniteLostPartitionsRecoveryTest.testLostPartitionsRestoredAfterClusterRestart
- PASSED{color}
* {color:#013220}IgniteCacheTestSuite12:
IgniteLostPartitionsRecoveryTest.testLostPartitionsRestoredAfterInactivity -
PASSED{color}
* {color:#013220}IgniteCacheTestSuite12:
IgniteLostPartitionsRecoveryTest.testClusterRestartWthEmptyPartitions -
PASSED{color}
{color:#00008b}Snapshots 3{color} [[tests
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586614]]
*
{color:#013220}org.apache.ignite.internal.processors.cache.persistence.snapshot.dump.IgniteCacheDumpSelfTest.testWithConcurrentRemovals[nodes=1,backups=0,persistence=true,mode=ATOMIC,useDataStreamer=false,onlyPrimary=false,encrypted=false]
- PASSED{color}
{color:#00008b}JCache TCK 1.1{color} [[tests
7|https://ci2.ignite.apache.org/viewLog.html?buildId=8586558]]
* {color:#013220}CacheWriterExceptionTest: TypesTest.sanityCheckTestDomain -
PASSED{color}
* {color:#013220}CacheWriterExceptionTest:
TypesTest.simpleAPINoGenericsAndNoTypeEnforcementStoreByValue - PASSED{color}
* {color:#013220}CacheWriterExceptionTest:
TypesTest.genericsEnforcementAndStricterTypeEnforcement - PASSED{color}
* {color:#013220}CacheWriterExceptionTest:
TypesTest.genericsEnforcementAndStricterTypeEnforcementFromCaching -
PASSED{color}
* {color:#013220}CacheWriterExceptionTest:
TypesTest.simpleAPINoGenericsAndNoTypeEnforcementStoreByReference -
PASSED{color}
* {color:#013220}CacheWriterExceptionTest:
TypesTest.simpleAPITypeEnforcementObject - PASSED{color}
* {color:#013220}CacheWriterExceptionTest:
TypesTest.simpleAPIWithGenericsAndNoTypeEnforcement - PASSED{color}
{panel}
[TeamCity *--> Run :: All*
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=8586785&buildTypeId=IgniteTests24Java8_RunAll]
> Enhance partition loss detection between cluster restarts
> ----------------------------------------------------------
>
> Key: IGNITE-26168
> URL: https://issues.apache.org/jira/browse/IGNITE-26168
> Project: Ignite
> Issue Type: Task
> Reporter: Mikhail Petrov
> Assignee: Mikhail Petrov
> Priority: Major
> Fix For: 2.18
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> The problem based on real case scenario:
> 1. Cluster with PDS enabled is deactivated and stopped gracefully.
> 2. Some physical servers are replaced with their PDS being cleared during
> maintenance (this may also be done unintentionally or due to some hardware
> issues)
> 3. The replaced servers represent all primary and backups nodes for some
> partitions (cell). As a result the data is lost.
> 4. Cluster is restarted.
> 5. Idle verify procedure completes successfully.
> 6. Cluster is activated successfully.
> As a result, Ignite successfully continues its work after restart. But some
> of the data just disappeared. Ignite users do not see warnings, and data loss
> may be detected accidentally after a while.
> The described situation can be safely resolved by replacing the nodes one by
> one and waiting for the rebalancing to complete.
> But as mentioned in clause 2 PDS data can be lost for different reasons.
> Currently, Ignite supports mechanism for detecting lost partitions, which is
> designed to restrict cache operations in case some cache partitions are lost
> (due to node leaving or failure). But its behaviour is not consistent between
> cluster restarts/activation and deactivation.
> Consider cluster with PDS enabled. The following list shows possible
> scenarious when all partitions owners(parimary and backups) leave the cluster.
> 1. activation -> cell left -> lost parts
> 2. activation -> cell left -> cell joined -> lost parts
> 3. activation -> cell left -> deactivation -> cell joined -> activation ->
> ignored
> 4. activation -> cell left -> cell joined -> deactivation -> activation ->
> lost parts
> 5. activation -> cell left -> deactivation -> activation -> cell joined ->
> lost parts
> 6. deactivation -> cell left -> cell joined -> activation -> ignored
> 7. deactivation -> cell left -> activation -> cell joined -> lost parts
> cell - node group that stores all primary and backup partitions. Can be
> configured via ClusterNodeAttributeColocatedBackupFilter
> lost parts - ignite detected lost partitions. Cache operations are
> restricted according to policy
> ignored - no partition loss is detected. if cell nodes join the cluster
> with PDS data cleared, ignite will not detect partitions loss - it just
> recreates missed partitions
> deactivation - you can also consider a cluster stop after deactivation and
> cluster start before activation
> It is proposed to fix Ignite to detect local partitions for clauses 3 and 6.
> Note that we are considering only case when cluster is stopped gracefully.
> The main idea -
> 1. During PME caused by deactivation, aggregate on coordinator partition info
> and list of lost partitions from all nodes.
> 2. Distribute aggregated information using PME Full Message and store it in
> each node's local metastorage.
> 3. During activation use stored info to detect lost partitions. If some
> partitions has zero update counters in received single messages, but
> according to saved partition info they were updated - mark them as lost.
> Partition Info includes a list of partition IDs that were not
> initialized(update counter == 0, it`s crucial because currently Ignite can't
> distinguish between a partition not being updated at all or being deleted
> between restarts) and list of partition IDs that were marked as lost at the
> time of deactivation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)