adoroszlai opened a new pull request, #4966:
URL: https://github.com/apache/ozone/pull/4966
## What changes were proposed in this pull request?
If failures happen during volume initialization, the volume object is
abandoned, and a failed volume created instead. The original object should be
cleaned up.
1. `StorageVolume#initialize` may throw `IOException`. By this time it may
have started background thread for disk usage check. `HddsVolume` also created
and registered metrics objects.
2. `MutableVolumeSet#initializeVolumeSet` also may throw `IOException` if
storage directory does not exist and cannot be created.
To reproduce, simply add `/root` as an additional data or DB directory
(assuming non-root user runs datanode):
```
OZONE-SITE.XML_hdds.datanode.dir=/root,/data/hdds
OZONE-SITE.XML_hdds.datanode.container.db.dir=/root,/data/metadata/db
```
Failed data volume results in datanode stopping during startup:
```
datanode_1 | [main] INFO volume.HddsVolume: Creating HddsVolume: /root/hdds
of storage type : DISK capacity : 499596230656
datanode_1 | [main] ERROR ozone.HddsDatanodeService: Exception in
HddsDatanodeService.
datanode_1 | org.apache.hadoop.metrics2.MetricsException: Metrics source
VolumeInfoMetrics-/root already exists!
datanode_1 | at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
datanode_1 | at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
datanode_1 | at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.VolumeInfoMetrics.init(VolumeInfoMetrics.java:50)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.VolumeInfoMetrics.<init>(VolumeInfoMetrics.java:45)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:140)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:73)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:115)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.HddsVolumeFactory.createFailedVolume(HddsVolumeFactory.java:60)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:187)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.<init>(MutableVolumeSet.java:135)
datanode_1 | at
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.<init>(MutableVolumeSet.java:99)
datanode_1 | at
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.<init>(OzoneContainer.java:146)
datanode_1 | at
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.<init>(DatanodeStateMachine.java:173)
datanode_1 | at
org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:306)
```
Failed DB volume only results in leftover thread for the abandoned volume
(`/root` here):
```
$ jstack $(jps | grep Datanode | awk '{ print $1 }') | grep DiskUsage
"DiskUsage-/data/hdds-
"DiskUsage-/data/metadata/ratis-
"DiskUsage-/root-
"DiskUsage-/data/metadata/db-
```
https://issues.apache.org/jira/browse/HDDS-8914
## How was this patch tested?
Verified that datanode starts successfully with additional read-only volume,
and that `DiskUsage` thread for the abandoned volume is not running.
```
$ jstack $(jps | grep Datanode | awk '{ print $1 }') | grep DiskUsage
"DiskUsage-/data/hdds-
"DiskUsage-/data/metadata/ratis-
"DiskUsage-/data/metadata/db-
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]