Wei-Chiu Chuang created HDDS-8769:
-------------------------------------
Summary: [hsync] disk usage thread aborts if ratis log rolls very
quickly
Key: HDDS-8769
URL: https://issues.apache.org/jira/browse/HDDS-8769
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Wei-Chiu Chuang
The Ratis log file corresponding to a HBase WAL block rolls very quickly.
The disk usage thread aborts because of the change of log file name, and then
the DN is unable to get correct disk usage.
{noformat}
2023-06-05 08:44:55,462
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
created new log segment
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383
2023-06-05 08:44:55,514 [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread16]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
Rolling segment log-186383_186396 to index:186396
2023-06-05 08:44:55,516
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
Rolled log segment from
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383
to
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186383-186396
2023-06-05 08:44:55,517
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
created new log segment
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186397
2023-06-05 08:44:55,570 [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread18]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
Rolling segment log-186397_186411 to index:186411
2023-06-05 08:44:55,572
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
Rolled log segment from
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186397
to
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186397-186411
2023-06-05 08:44:55,573
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
created new log segment
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186412
2023-06-05 08:44:55,644 [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread18]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
Rolling segment log-186412_186434 to index:186434
2023-06-05 08:44:55,646
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
Rolled log segment from
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186412
to
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186412-186434
2023-06-05 08:44:55,647
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
created new log segment
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186435
2023-06-05 08:44:55,673 [DiskUsage-/var/lib/hadoop-ozone/datanode/ratis/data-
] WARN org.apache.hadoop.hdds.fs.CachingSpaceUsageSource: Error refreshing
space usage for /var/lib/hadoop-ozone/datanode/ratis/data
java.io.UncheckedIOException: ExitCodeException exitCode=1: du: cannot access
‘/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383’:
No such file or directory
at org.apache.hadoop.hdds.fs.DU$DUShell.getUsed(DU.java:94)
at
org.apache.hadoop.hdds.fs.AbstractSpaceUsageSource.time(AbstractSpaceUsageSource.java:56)
at org.apache.hadoop.hdds.fs.DU.getUsedSpace(DU.java:63)
at
org.apache.hadoop.hdds.fs.CachingSpaceUsageSource.refresh(CachingSpaceUsageSource.java:140)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=1: du: cannot access
‘/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383’:
No such file or directory
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
at org.apache.hadoop.util.Shell.run(Shell.java:901)
at org.apache.hadoop.hdds.fs.DU$DUShell.getUsed(DU.java:91)
... 10 more
{noformat}
The workaround is use DF instead of DU to calculate disk usage
(hdds.datanode.du.factory=org.apache.hadoop.hdds.fs.DedicatedDiskSpaceUsageFactory)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]