Ivan Bessonov created IGNITE-23325:
--------------------------------------

             Summary: Checkpoint read-lock timeout under high load
                 Key: IGNITE-23325
                 URL: https://issues.apache.org/jira/browse/IGNITE-23325
             Project: Ignite
          Issue Type: Bug
            Reporter: Ivan Bessonov


We encounter the following situation while having a very intensive load. It 
should not happen
{noformat}
[2024-09-25T11:03:12,180][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint started [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, 
checkpointBeforeWriteLockTime=30ms, checkpointWriteLockWait=1ms, 
checkpointListenersExecuteTime=2ms, checkpointWriteLockHoldTime=4ms, 
splitAndSortPagesDuration=276ms, pages=80904, reason='too many dirty pages']
227808.089 ops/s
[2024-09-25T11:03:13,438][WARN 
][org.apache.ignite.internal.benchmark.UpsertKvBenchmark.upsert-jmh-worker-10][PersistentPageMemory]
 Page replacements started, pages will be rotated with disk, this will affect 
storage performance (consider increasing 
PageMemoryDataRegionConfiguration#setMaxSize for data region) [region=default]
[2024-09-25T11:03:13,796][INFO ][checkpoint-runner-io0][CheckpointPagesWriter] 
Checkpoint pages were not written yet due to unsuccessful page write lock 
acquisition and will be retried [pageCount=1]
[2024-09-25T11:03:14,071][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, 
pages=80904, pagesWriteTime=1614ms, fsyncTime=273ms, totalTime=2203ms]
[2024-09-25T11:03:14,073][INFO ][%node_3344%compaction-thread][Compactor] 
Starting new compaction round [files=64]
[2024-09-25T11:03:15,828][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint started [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, 
checkpointBeforeWriteLockTime=21ms, checkpointWriteLockWait=0ms, 
checkpointListenersExecuteTime=6ms, checkpointWriteLockHoldTime=6ms, 
splitAndSortPagesDuration=205ms, pages=77234, reason='too many dirty pages']
231068.630 ops/s
271814.068 ops/s
# Warmup Iteration   8: [2024-09-25T11:03:23,547][INFO 
][%node_3344%compaction-thread][Compactor] Starting new compaction round 
[files=64]
[2024-09-25T11:03:23,547][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, 
pages=77234, pagesWriteTime=2547ms, fsyncTime=5099ms, totalTime=7951ms]
26376.850 ops/s
# Warmup Iteration   9: [2024-09-25T11:03:23,685][INFO 
][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started 
[checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, 
checkpointBeforeWriteLockTime=9ms, checkpointWriteLockWait=0ms, 
checkpointListenersExecuteTime=1ms, checkpointWriteLockHoldTime=1ms, 
splitAndSortPagesDuration=125ms, pages=95251, reason='too many dirty pages']
[2024-09-25T11:03:34,706][INFO ][%node_3344%compaction-thread][Compactor] 
Compaction round finished [duration=11159ms]
[2024-09-25T11:03:34,707][INFO ][%node_3344%compaction-thread][Compactor] 
Starting new compaction round [files=41]
[2024-09-25T11:03:35,444][INFO ][%node_3344%lease-updater][LeaseUpdater] Leases 
updated (printed once per 10 iteration(s)): [inCurrentIteration=LeaseStats 
[leasesCreated=0, leasesPublished=0, leasesProlonged=64, 
leasesWithoutCandidates=0], active=64, currentAssignmentsSize=64].
[2024-09-25T11:03:39,075][WARN 
][%node_3344%meta-storage-safe-time-0][TrackableNetworkMessageHandler] Message 
handling has been too long [duration=11ms, message=class 
org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]
[2024-09-25T11:03:40,287][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, 
pages=95251, pagesWriteTime=3098ms, fsyncTime=13146ms, totalTime=16740ms]
[2024-09-25T11:03:40,307][WARN 
][%node_3344%JRaft-FSMCaller-Disruptor_stripe_24-0][FailureProcessor] Possible 
failure suppressed according to a configured handler [hnd=NoOpFailureHandler 
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=SYSTEM_CRITICAL_OPERATION_TIMEOUT]
org.apache.ignite.internal.lang.IgniteInternalException: Checkpoint read lock 
acquisition has been timed out.
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:242)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:130)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:175)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:659)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.runConsistently(PersistentPageMemoryMvPartitionStorage.java:170)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.raft.snapshot.outgoing.SnapshotAwarePartitionDataStorage.runConsistently(SnapshotAwarePartitionDataStorage.java:76)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.raft.PartitionListener.advanceLastAppliedIndexConsistently(PartitionListener.java:508)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.raft.PartitionListener.handleSafeTimeSyncCommand(PartitionListener.java:504)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWriteBusy$2(PartitionListener.java:250)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) 
~[?:?]
        at 
org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWriteBusy(PartitionListener.java:199)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:192)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:784)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:686)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:654)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:573)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:246)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:240)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.consumeBatch(StripedDisruptor.java:456)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:334)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:272)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
        at 
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
 ~[disruptor-4.0.0.jar:?]
        at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122) 
~[disruptor-4.0.0.jar:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2024-09-25T11:03:40,321][WARN 
][org.apache.ignite.internal.benchmark.UpsertKvBenchmark.upsert-jmh-worker-6][FailureProcessor]
 Possible failure suppressed according to a configured handler 
[hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=SYSTEM_CRITICAL_OPERATION_TIMEOUT]
org.apache.ignite.internal.lang.IgniteInternalException: Checkpoint read lock 
acquisition has been timed out.
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:242)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:130)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:175)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:659)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.runConsistently(PersistentPageMemoryMvPartitionStorage.java:170)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.raft.snapshot.outgoing.SnapshotAwarePartitionDataStorage.runConsistently(SnapshotAwarePartitionDataStorage.java:76)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.StorageUpdateHandler.handleUpdate(StorageUpdateHandler.java:109)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyUpdateCommand$125(PartitionReplicaListener.java:2766)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.tracing.Instrumentation.measure(Instrumentation.java:217)
 ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyUpdateCommand$126(PartitionReplicaListener.java:2766)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
 ~[?:?]
        at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2341)
 ~[?:?]
        at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.applyUpdateCommand(PartitionReplicaListener.java:2754)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.applyUpdateCommand(PartitionReplicaListener.java:2805)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processSingleEntryAction$147(PartitionReplicaListener.java:3053)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
 ~[?:?]
        at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2341)
 ~[?:?]
        at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processSingleEntryAction$149(PartitionReplicaListener.java:3052)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
        at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
 ~[?:?]
        at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2341)
 ~[?:?]
        at 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processSingleEntryAction$152(PartitionReplicaListener.java:3050)
 ~[ignite-table-3.0.0-SNAPSHOT.jar:?]{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to