[
https://issues.apache.org/jira/browse/IGNITE-23325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Pligin reassigned IGNITE-23325:
----------------------------------------
Assignee: Philipp Shergalis
> Checkpoint read-lock timeout under high load
> --------------------------------------------
>
> Key: IGNITE-23325
> URL: https://issues.apache.org/jira/browse/IGNITE-23325
> Project: Ignite
> Issue Type: Bug
> Reporter: Ivan Bessonov
> Assignee: Philipp Shergalis
> Priority: Major
> Labels: ignite-3
>
> We encounter the following situation while having a very intensive load. It
> should not happen.
> Can be reproduced in {{ignite-22835-2}} on
> {{6bc1f97d0d2506b666975eea57b78ad8609b69d7}} commit.
> {noformat}
> [2024-09-25T11:03:12,180][INFO ][%node_3344%checkpoint-thread][Checkpointer]
> Checkpoint started [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82,
> checkpointBeforeWriteLockTime=30ms, checkpointWriteLockWait=1ms,
> checkpointListenersExecuteTime=2ms, checkpointWriteLockHoldTime=4ms,
> splitAndSortPagesDuration=276ms, pages=80904, reason='too many dirty pages']
> 227808.089 ops/s
> [2024-09-25T11:03:13,438][WARN
> ][org.apache.ignite.internal.benchmark.UpsertKvBenchmark.upsert-jmh-worker-10][PersistentPageMemory]
> Page replacements started, pages will be rotated with disk, this will affect
> storage performance (consider increasing
> PageMemoryDataRegionConfiguration#setMaxSize for data region) [region=default]
> [2024-09-25T11:03:13,796][INFO
> ][checkpoint-runner-io0][CheckpointPagesWriter] Checkpoint pages were not
> written yet due to unsuccessful page write lock acquisition and will be
> retried [pageCount=1]
> [2024-09-25T11:03:14,071][INFO ][%node_3344%checkpoint-thread][Checkpointer]
> Checkpoint finished [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82,
> pages=80904, pagesWriteTime=1614ms, fsyncTime=273ms, totalTime=2203ms]
> [2024-09-25T11:03:14,073][INFO ][%node_3344%compaction-thread][Compactor]
> Starting new compaction round [files=64]
> [2024-09-25T11:03:15,828][INFO ][%node_3344%checkpoint-thread][Checkpointer]
> Checkpoint started [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8,
> checkpointBeforeWriteLockTime=21ms, checkpointWriteLockWait=0ms,
> checkpointListenersExecuteTime=6ms, checkpointWriteLockHoldTime=6ms,
> splitAndSortPagesDuration=205ms, pages=77234, reason='too many dirty pages']
> 231068.630 ops/s
> 271814.068 ops/s
> # Warmup Iteration 8: [2024-09-25T11:03:23,547][INFO
> ][%node_3344%compaction-thread][Compactor] Starting new compaction round
> [files=64]
> [2024-09-25T11:03:23,547][INFO ][%node_3344%checkpoint-thread][Checkpointer]
> Checkpoint finished [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8,
> pages=77234, pagesWriteTime=2547ms, fsyncTime=5099ms, totalTime=7951ms]
> 26376.850 ops/s
> # Warmup Iteration 9: [2024-09-25T11:03:23,685][INFO
> ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started
> [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c,
> checkpointBeforeWriteLockTime=9ms, checkpointWriteLockWait=0ms,
> checkpointListenersExecuteTime=1ms, checkpointWriteLockHoldTime=1ms,
> splitAndSortPagesDuration=125ms, pages=95251, reason='too many dirty pages']
> [2024-09-25T11:03:34,706][INFO ][%node_3344%compaction-thread][Compactor]
> Compaction round finished [duration=11159ms]
> [2024-09-25T11:03:34,707][INFO ][%node_3344%compaction-thread][Compactor]
> Starting new compaction round [files=41]
> [2024-09-25T11:03:35,444][INFO ][%node_3344%lease-updater][LeaseUpdater]
> Leases updated (printed once per 10 iteration(s)):
> [inCurrentIteration=LeaseStats [leasesCreated=0, leasesPublished=0,
> leasesProlonged=64, leasesWithoutCandidates=0], active=64,
> currentAssignmentsSize=64].
> [2024-09-25T11:03:39,075][WARN
> ][%node_3344%meta-storage-safe-time-0][TrackableNetworkMessageHandler]
> Message handling has been too long [duration=11ms, message=class
> org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]
> [2024-09-25T11:03:40,287][INFO ][%node_3344%checkpoint-thread][Checkpointer]
> Checkpoint finished [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c,
> pages=95251, pagesWriteTime=3098ms, fsyncTime=13146ms, totalTime=16740ms]
> [2024-09-25T11:03:40,307][WARN
> ][%node_3344%JRaft-FSMCaller-Disruptor_stripe_24-0][FailureProcessor]
> Possible failure suppressed according to a configured handler
> [hnd=NoOpFailureHandler [super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=SYSTEM_CRITICAL_OPERATION_TIMEOUT]
> org.apache.ignite.internal.lang.IgniteInternalException: Checkpoint read lock
> acquisition has been timed out.
> at
> org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:242)
> ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:130)
> ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:175)
> ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:659)
> ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.runConsistently(PersistentPageMemoryMvPartitionStorage.java:170)
> ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.raft.snapshot.outgoing.SnapshotAwarePartitionDataStorage.runConsistently(SnapshotAwarePartitionDataStorage.java:76)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.advanceLastAppliedIndexConsistently(PartitionListener.java:508)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.handleSafeTimeSyncCommand(PartitionListener.java:504)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWriteBusy$2(PartitionListener.java:250)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> ~[?:?]
> at
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWriteBusy(PartitionListener.java:199)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:192)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:784)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:686)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:654)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:573)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:246)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:240)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.consumeBatch(StripedDisruptor.java:456)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:334)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:272)
> ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
> ~[disruptor-4.0.0.jar:?]
> at
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
> ~[disruptor-4.0.0.jar:?]
> at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
> [2024-09-25T11:03:40,321][WARN
> ][org.apache.ignite.internal.benchmark.UpsertKvBenchmark.upsert-jmh-worker-6][FailureProcessor]
> Possible failure suppressed according to a configured handler
> [hnd=NoOpFailureHandler [super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=SYSTEM_CRITICAL_OPERATION_TIMEOUT]
> org.apache.ignite.internal.lang.IgniteInternalException: Checkpoint read lock
> acquisition has been timed out.
> at
> org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:242)
> ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:130)
> ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:175)
> ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:659)
> ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.runConsistently(PersistentPageMemoryMvPartitionStorage.java:170)
> ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.raft.snapshot.outgoing.SnapshotAwarePartitionDataStorage.runConsistently(SnapshotAwarePartitionDataStorage.java:76)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.handleUpdate(StorageUpdateHandler.java:109)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyUpdateCommand$125(PartitionReplicaListener.java:2766)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.tracing.Instrumentation.measure(Instrumentation.java:217)
> ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyUpdateCommand$126(PartitionReplicaListener.java:2766)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
> ~[?:?]
> at
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2341)
> ~[?:?]
> at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.applyUpdateCommand(PartitionReplicaListener.java:2754)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.applyUpdateCommand(PartitionReplicaListener.java:2805)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processSingleEntryAction$147(PartitionReplicaListener.java:3053)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
> ~[?:?]
> at
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2341)
> ~[?:?]
> at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processSingleEntryAction$149(PartitionReplicaListener.java:3052)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> at
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
> ~[?:?]
> at
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2341)
> ~[?:?]
> at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processSingleEntryAction$152(PartitionReplicaListener.java:3050)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)