Aleksey Plekhanov created IGNITE-19111:
------------------------------------------
Summary: Storage corruption if pages changed after last checkpoint
during deactivation
Key: IGNITE-19111
URL: https://issues.apache.org/jira/browse/IGNITE-19111
Project: Ignite
Issue Type: Bug
Reporter: Aleksey Plekhanov
Assignee: Aleksey Plekhanov
During cluster deactivation we force checkpoint (with "caches stop" reason) and
remove checkpoint listeners before actual caches stop. But if there are some
activity with data pages on the node after that checkpoint, but before caches
stops and next checkpoint is started, the storage can be corrupted.
Reproducer:
{code:java}
/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String
igniteInstanceName) throws Exception {
return super.getConfiguration(igniteInstanceName)
.setDataStorageConfiguration(new DataStorageConfiguration()
.setDefaultDataRegionConfiguration(new
DataRegionConfiguration().setPersistenceEnabled(true))
.setCheckpointFrequency(1_000L))
.setFailureHandler(new StopNodeFailureHandler());
}
/** */
@Test
public void testCpAfterClusterDeactivate() throws Exception {
IgniteEx ignite0 = startGrid(0);
IgniteEx ignite1 = startGrid(1);
ignite0.cluster().state(ClusterState.ACTIVE);
ignite0.getOrCreateCache(new
CacheConfiguration<>(DEFAULT_CACHE_NAME).setBackups(1)
.setAffinity(new RendezvousAffinityFunction(false, 10)));
try (IgniteDataStreamer<Integer, Integer> streamer =
ignite0.dataStreamer(DEFAULT_CACHE_NAME)) {
for (int i = 0; i < 100_000; i++)
streamer.addData(i, i);
}
stopGrid(0);
try (IgniteDataStreamer<Integer, Integer> streamer =
ignite1.dataStreamer(DEFAULT_CACHE_NAME)) {
streamer.allowOverwrite(true);
for (int i = 0; i < 100_000; i++)
streamer.addData(i, i + 1);
}
ignite0 = startGrid(0);
((GridCacheDatabaseSharedManager)ignite0.context().cache().context().database()).addCheckpointListener(new
CheckpointListener() {
@Override public void onMarkCheckpointBegin(Context ctx) {
// No-op.
}
@Override public void onCheckpointBegin(Context ctx) {
if ("caches stop".equals(ctx.progress().reason()))
doSleep(1_000L);
}
@Override public void beforeCheckpointBegin(Context ctx) {
// No-op.
}
});
ignite0.cluster().state(ClusterState.INACTIVE);
doSleep(2_000L);
ignite0.cluster().state(ClusterState.ACTIVE);
IgniteCache<Integer, Integer> cache = ignite0.cache(DEFAULT_CACHE_NAME);
for (int i = 0; i < 100_000; i++)
assertEquals((Integer)(i + 1), cache.get(i));
} {code}
This reproducer shuts down the node with some probability (about 1/5 on my
laptop) on activation or on last check with {{{}CorruptedTreeException{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)