Wei-Chiu Chuang created HDDS-8769:
-------------------------------------

             Summary: [hsync] disk usage thread aborts if ratis log rolls very 
quickly
                 Key: HDDS-8769
                 URL: https://issues.apache.org/jira/browse/HDDS-8769
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Wei-Chiu Chuang


The Ratis log file corresponding to a HBase WAL block rolls very quickly.

The disk usage thread aborts because of the change of log file name, and then 
the DN is unable to get correct disk usage.

{noformat}
2023-06-05 08:44:55,462 
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
created new log segment 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383
2023-06-05 08:44:55,514 [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread16] 
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
Rolling segment log-186383_186396 to index:186396
2023-06-05 08:44:55,516 
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
Rolled log segment from 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383
 to 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186383-186396
2023-06-05 08:44:55,517 
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
created new log segment 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186397
2023-06-05 08:44:55,570 [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread18] 
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
Rolling segment log-186397_186411 to index:186411
2023-06-05 08:44:55,572 
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
Rolled log segment from 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186397
 to 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186397-186411
2023-06-05 08:44:55,573 
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
created new log segment 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186412
2023-06-05 08:44:55,644 [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread18] 
INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
Rolling segment log-186412_186434 to index:186434
2023-06-05 08:44:55,646 
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
Rolled log segment from 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186412
 to 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186412-186434
2023-06-05 08:44:55,647 
[37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker: 
created new log segment 
/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186435
2023-06-05 08:44:55,673 [DiskUsage-/var/lib/hadoop-ozone/datanode/ratis/data-
] WARN org.apache.hadoop.hdds.fs.CachingSpaceUsageSource: Error refreshing 
space usage for /var/lib/hadoop-ozone/datanode/ratis/data
java.io.UncheckedIOException: ExitCodeException exitCode=1: du: cannot access 
‘/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383’:
 No such file or directory

        at org.apache.hadoop.hdds.fs.DU$DUShell.getUsed(DU.java:94)
        at 
org.apache.hadoop.hdds.fs.AbstractSpaceUsageSource.time(AbstractSpaceUsageSource.java:56)
        at org.apache.hadoop.hdds.fs.DU.getUsedSpace(DU.java:63)
        at 
org.apache.hadoop.hdds.fs.CachingSpaceUsageSource.refresh(CachingSpaceUsageSource.java:140)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=1: du: cannot access 
‘/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383’:
 No such file or directory

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
        at org.apache.hadoop.util.Shell.run(Shell.java:901)
        at org.apache.hadoop.hdds.fs.DU$DUShell.getUsed(DU.java:91)
        ... 10 more
{noformat}

The workaround is use DF instead of DU to calculate disk usage 
(hdds.datanode.du.factory=org.apache.hadoop.hdds.fs.DedicatedDiskSpaceUsageFactory)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to