Wei-Chiu Chuang created HDDS-11773:
--------------------------------------
Summary: Frequent DataNode Ratis snapshotting
Key: HDDS-11773
URL: https://issues.apache.org/jira/browse/HDDS-11773
Project: Apache Ozone
Issue Type: Task
Reporter: Wei-Chiu Chuang
On a cluster with heavy HBase workload (more than 1000
hsync/WriteChunk+PutBlock) request, it is observed Ratis is taking snapshots
every 5-8 seconds.
It looks too aggressive and should be tuned to avoid too much overhead penalty.
cc: [~smeng] I suspect this is related to the small size per checksum causing
client having to transmit longer metadata.
{noformat}
2024-11-21 16:46:13,617 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
group-D595E2D0A206: Taking a snapshot at:(t:171, i:4727280) file
/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4727280
2024-11-21 16:46:13,619 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
group-D595E2D0A206: Finished taking a snapshot at:(t:171, i:4727280)
file:/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4727280
took: 3 ms
2024-11-21 16:46:13,620 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.statemachine.impl.SimpleStateMachineStorage:
Deleting old snapshot at
/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4677268
2024-11-21 16:46:13,620 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater:
Took a snapshot at index 4727280
2024-11-21 16:46:13,620 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater:
snapshotIndex: updateIncreasingly 4717277 -> 4727280
2024-11-21 16:46:19,244 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
group-D595E2D0A206: Taking a snapshot at:(t:171, i:4737280) file
/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4737280
2024-11-21 16:46:19,246 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
group-D595E2D0A206: Finished taking a snapshot at:(t:171, i:4737280)
file:/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4737280
took: 2 ms
2024-11-21 16:46:19,246 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.statemachine.impl.SimpleStateMachineStorage:
Deleting old snapshot at
/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4687272
2024-11-21 16:46:19,246 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater:
Took a snapshot at index 4737280
2024-11-21 16:46:19,246 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater:
snapshotIndex: updateIncreasingly 4727280 -> 4737280
2024-11-21 16:46:24,739 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
group-D595E2D0A206: Taking a snapshot at:(t:171, i:4747283) file
/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4747283
2024-11-21 16:46:24,741 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
group-D595E2D0A206: Finished taking a snapshot at:(t:171, i:4747283)
file:/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4747283
took: 2 ms
2024-11-21 16:46:24,741 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.statemachine.impl.SimpleStateMachineStorage:
Deleting old snapshot at
/var/lib/hadoop-ozone/datanode/ratis/data/c420af11-2786-4f5a-9b5a-d595e2d0a206/sm/snapshot.171_4697275
2024-11-21 16:46:24,742 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater:
Took a snapshot at index 4747283
2024-11-21 16:46:24,742 INFO
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
7cc563b3-14b5-4334-820b-5c3bbecffad8@group-D595E2D0A206-StateMachineUpdater:
snapshotIndex: updateIncreasingly 4737280 -> 4747283 {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]