[ 
https://issues.apache.org/jira/browse/IGNITE-26168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018676#comment-18018676
 ] 

Ignite TC Bot commented on IGNITE-26168:
----------------------------------------

{panel:title=Branch: [pull/12288/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/12288/head] Base: [master] : New Tests 
(25)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}Calcite SQL{color} [[tests 
3|https://ci2.ignite.apache.org/viewLog.html?buildId=8589487]]
* {color:#013220}subquery/any_all.test_scalar_any_all.test_ignore - 
PASSED{color}
* 
{color:#013220}org.apache.ignite.internal.processors.query.calcite.integration.SqlPlanHistoryIntegrationTest.testSqlFieldsCrossCacheQuery[sqlEngine=calcite,
 isClient=false loc=false, isFullyFetched=false, isPerfStatsEnabled=false] - 
PASSED{color}
* 
{color:#013220}org.apache.ignite.internal.processors.query.calcite.exec.rel.SortAggregateExecutionTest.avg[Task
 executor = QUERY_BLOCKING, Execution strategy = FIFO, type=MAP_REDUCE] - 
PASSED{color}

{color:#00008b}Queries 3 (lazy=true){color} [[tests 
2|https://ci2.ignite.apache.org/viewLog.html?buildId=8586749]]
* 
{color:#013220}org.apache.ignite.internal.processors.query.IgniteSqlDefaultSchemaTest.testBasicOpsImplicitPublicSchema
 - PASSED{color}
* 
{color:#013220}org.apache.ignite.internal.processors.query.IgniteSqlNotNullConstraintTest.testTxInvokeAllDelete
 - PASSED{color}

{color:#00008b}JDBC Driver{color} [[tests 
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586560]]
* 
{color:#013220}org.apache.ignite.jdbc.thin.JdbcThinStreamingOrderedSelfTest.testCustomObject
 - PASSED{color}

{color:#00008b}Cache 7{color} [[tests 
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586523]]
* 
{color:#013220}org.apache.ignite.internal.processors.cache.IgniteDynamicCacheStartFailWithPersistenceTest.testTopologyChangesAfterFailure
 - PASSED{color}

{color:#00008b}Queries 1 (lazy=true){color} [[tests 
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586745]]
* 
{color:#013220}org.apache.ignite.internal.processors.cache.IgniteCacheSqlDmlErrorSelfTest.testUpdateMixingValueAndValueFields
 - PASSED{color}

{color:#00008b}Queries 1{color} [[tests 
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586582]]
* 
{color:#013220}org.apache.ignite.internal.processors.cache.index.DynamicIndexClientBasicSelfTest.testDropNoIndexPartitionedTransactional
 - PASSED{color}

{color:#00008b}Cache 12{color} [[tests 
8|https://ci2.ignite.apache.org/viewLog.html?buildId=8586516]]
* {color:#013220}IgniteCacheTestSuite12: 
IgniteLostPartitionsRecoveryTest.testNodeJoinWithStaleCacheGroupRecoveryData - 
PASSED{color}
* {color:#013220}IgniteCacheTestSuite12: 
IgniteLostPartitionsRecoveryTest.testNodeJoinDuringClusterStateTransition - 
PASSED{color}
* {color:#013220}IgniteCacheTestSuite12: 
IgniteLostPartitionsRecoveryTest.testCoordinatorWithMissingCacheGroupRecoveryData
 - PASSED{color}
* {color:#013220}IgniteCacheTestSuite12: 
IgniteLostPartitionsRecoveryTest.testNodeJoinAfterActivation - PASSED{color}
* {color:#013220}IgniteCacheTestSuite12: 
IgniteLostPartitionsRecoveryTest.testPartitionLossDetectionOnActivation - 
PASSED{color}
* {color:#013220}IgniteCacheTestSuite12: 
IgniteLostPartitionsRecoveryTest.testLostPartitionsRestoredAfterClusterRestart 
- PASSED{color}
* {color:#013220}IgniteCacheTestSuite12: 
IgniteLostPartitionsRecoveryTest.testLostPartitionsRestoredAfterInactivity - 
PASSED{color}
* {color:#013220}IgniteCacheTestSuite12: 
IgniteLostPartitionsRecoveryTest.testClusterRestartWthEmptyPartitions - 
PASSED{color}

{color:#00008b}Snapshots 3{color} [[tests 
1|https://ci2.ignite.apache.org/viewLog.html?buildId=8586614]]
* 
{color:#013220}org.apache.ignite.internal.processors.cache.persistence.snapshot.dump.IgniteCacheDumpSelfTest.testWithConcurrentRemovals[nodes=1,backups=0,persistence=true,mode=ATOMIC,useDataStreamer=false,onlyPrimary=false,encrypted=false]
 - PASSED{color}

{color:#00008b}JCache TCK 1.1{color} [[tests 
7|https://ci2.ignite.apache.org/viewLog.html?buildId=8586558]]
* {color:#013220}CacheWriterExceptionTest: TypesTest.sanityCheckTestDomain - 
PASSED{color}
* {color:#013220}CacheWriterExceptionTest: 
TypesTest.simpleAPINoGenericsAndNoTypeEnforcementStoreByValue - PASSED{color}
* {color:#013220}CacheWriterExceptionTest: 
TypesTest.genericsEnforcementAndStricterTypeEnforcement - PASSED{color}
* {color:#013220}CacheWriterExceptionTest: 
TypesTest.genericsEnforcementAndStricterTypeEnforcementFromCaching - 
PASSED{color}
* {color:#013220}CacheWriterExceptionTest: 
TypesTest.simpleAPINoGenericsAndNoTypeEnforcementStoreByReference - 
PASSED{color}
* {color:#013220}CacheWriterExceptionTest: 
TypesTest.simpleAPITypeEnforcementObject - PASSED{color}
* {color:#013220}CacheWriterExceptionTest: 
TypesTest.simpleAPIWithGenericsAndNoTypeEnforcement - PASSED{color}

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=8586785&buildTypeId=IgniteTests24Java8_RunAll]

> Enhance partition loss detection between cluster restarts 
> ----------------------------------------------------------
>
>                 Key: IGNITE-26168
>                 URL: https://issues.apache.org/jira/browse/IGNITE-26168
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Mikhail Petrov
>            Assignee: Mikhail Petrov
>            Priority: Major
>             Fix For: 2.18
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The problem based on real case scenario:
> 1. Cluster with PDS enabled is deactivated and stopped gracefully. 
> 2. Some physical servers are replaced with their PDS being cleared during 
> maintenance (this may also be done unintentionally or due to some hardware 
> issues)
> 3. The replaced servers represent all primary and backups nodes for some 
> partitions (cell). As a result the data is lost. 
> 4. Cluster is restarted.
> 5. Idle verify procedure completes successfully.
> 6. Cluster is activated successfully.
> As a result, Ignite successfully continues its work after restart. But some 
> of the data just disappeared. Ignite users do not see warnings, and data loss 
> may be detected accidentally after a while.
> The described situation can be safely resolved by replacing the nodes one by 
> one and waiting for the rebalancing to complete.
> But as mentioned in clause 2 PDS data can be lost for different reasons.
> Currently, Ignite supports mechanism for detecting lost partitions, which is 
> designed to restrict cache operations in case some cache partitions are lost 
> (due to node leaving or failure). But its behaviour is not consistent between 
> cluster restarts/activation and deactivation.
> Consider cluster with PDS enabled. The following list shows possible 
> scenarious when all partitions owners(parimary and backups) leave the cluster.
> 1. activation -> cell left -> lost parts 
> 2. activation -> cell left -> cell joined -> lost parts
> 3. activation -> cell left -> deactivation -> cell joined -> activation -> 
> ignored
> 4. activation -> cell left -> cell joined -> deactivation -> activation -> 
> lost parts
> 5. activation -> cell left -> deactivation -> activation -> cell joined -> 
> lost parts
> 6. deactivation -> cell left -> cell joined -> activation -> ignored
> 7. deactivation -> cell left -> activation -> cell joined -> lost parts
> cell       - node group that stores all primary and backup partitions. Can be 
> configured via ClusterNodeAttributeColocatedBackupFilter
> lost parts   - ignite detected lost partitions. Cache operations are 
> restricted according to policy
> ignored      - no partition loss is detected. if cell nodes join the cluster 
> with PDS data cleared, ignite will not detect partitions loss - it just 
> recreates missed partitions
> deactivation - you can also consider a cluster stop after deactivation and 
> cluster start before activation
> It is proposed to fix Ignite to detect local partitions for clauses 3 and 6. 
> Note that we are considering only case when cluster is stopped gracefully.
> The main idea - 
> 1. During PME caused by deactivation, aggregate on coordinator partition info 
> and list of lost partitions from all nodes. 
> 2. Distribute aggregated information using PME Full Message and store it in 
> each node's local metastorage.
> 3. During activation use stored info to detect lost partitions. If some 
> partitions has zero update counters in received single messages, but  
> according to saved partition info they were updated - mark them as lost. 
> Partition Info includes a list of partition IDs that were not 
> initialized(update counter == 0, it`s crucial because currently Ignite can't 
> distinguish between a partition not being updated at all or being deleted 
> between restarts) and list of partition IDs that were marked as lost at the 
> time of deactivation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to