[ 
https://issues.apache.org/jira/browse/HDDS-13866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18034979#comment-18034979
 ] 

Sammi Chen edited comment on HDDS-13866 at 11/3/25 8:21 AM:
------------------------------------------------------------

In Ozone, "ozone.metadata.dirs" is used in many places as the fallback solution 
if some specific properties is not defined.  If OM, SCM, DN, S3g are all 
installed on different node, then this fallback will not cause any problem. But 
if on the same node, then there will be conflict, for example, OM has ratis 
directory, SCM also has ratis directory, so does DN. So for fallback solution, 
we should add component name to the directory, for example,  scm/ratis, 
om/ratis, which create scm first, then ratis.
Besides ratis, rocksdb is also a common directory for Ozone major components, 
so whether rocksdb configuration has the same situation or not, is bettered be 
checked too. 
cc [~gargijaiswal]
The general idea is to check everythere that "ozone.metadata.dirs" is used, to 
see if prefix or component parent directory need be added. 


was (Author: sammi):
In Ozone, "ozone.metadata.dirs" is used in many places as the fallback solution 
if some specific properties is not defined.  If OM, SCM, DN, S3g are all 
installed on different node, then this fallback will not cause any problem. But 
if on the same node, then there will be conflict, for example, OM has ratis 
directory, SCM also has ratis directory, so does DN. So for fallback solution, 
we should add component name to the directory, for example,  scm/ratis, 
om/ratis, which create scm first, then ratis.
Besides ratis, rocksdb is also a common directory for Ozone major components, 
so whether rocksdb configuration has the same situation or not, is bettered be 
checked too. 
The general idea is to check everythere that "ozone.metadata.dirs" is used, to 
see if prefix or component parent directory need be added. 

> Ozone datanode startup exception on colocated hosts locked the storage 
> directory:./ratis/
> -----------------------------------------------------------------------------------------
>
>                 Key: HDDS-13866
>                 URL: https://issues.apache.org/jira/browse/HDDS-13866
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone CLI
>    Affects Versions: 2.0.0
>            Reporter: Soumitra Sulav
>            Assignee: Gargi Jaiswal
>            Priority: Major
>              Labels: installer
>
>  
> Common properties set on a colocated host for SCM/OM HA and datanodes.
> {code:java}
> <property>
>     <name>hdds.datanode.dir</name>
>     <value>DATA_BASE/data/dn</value>  </property>  
> <property>
>     <name>dfs.container.ratis.datanode.storage.dir</name>
>     <value>DATA_BASE/data/ratis</value>  </property>  
> <property>
>     <name>ozone.metadata.dirs</name>
>     <value>DATA_BASE/meta/data</value>  </property>  
> <property>
>     <name>ozone.scm.datanode.id.dir</name>
>     <value>DATA_BASE/meta/scm</value>  </property>  
> <property>
>     <name>ozone.om.ratis.snapshot.dir</name>
>     <value>DATA_BASE/meta/ratis</value>  
> </property>  
> <property>
>     <name>ozone.om.db.dirs</name>
>     <value>DATA_BASE/meta/om</value>
> </property>{code}
> Error seen in datanode logs
> {code:java}
> 2025-11-01 12:33:10,363 [main] INFO 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet: Added 
> Volume : /data/ozone/data/dn/hdds to VolumeSet
> 2025-11-01 12:33:10,366 [main] WARN 
> org.apache.hadoop.hdds.server.ServerUtils: Storage directory for Ratis is not 
> configured. It is a good idea to map this to an SSD disk. Falling back to 
> ozone.metadata.dirs
> 2025-11-01 12:33:10,372 [main] INFO 
> org.apache.hadoop.hdds.fs.SaveSpaceUsageToFile: Cached usage info found in 
> /data/ozone/meta/data/ratis/scmUsed: 4231168 at 2025-11-01T12:33:06.540Z{code}
>  
> {code:java}
> 2025-11-01 12:33:15,551 [a3f6b64c-0d96-4997-aaa4-ecf78411ab1c-impl-thread1] 
> ERROR org.apache.ratis.server.storage.RaftStorageDirectory: It appears that 
> another process has already locked the storage directory: 
> /data/ozone/meta/data/ratis/b4d30077-850e-3900-91cf-d0d586af6951
> java.nio.channels.OverlappingFileLockException
>         at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.tryLock(RaftStorageDirectoryImpl.java:226)
>         at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lambda$lock$0(RaftStorageDirectoryImpl.java:193)
>         at 
> org.apache.ratis.util.JavaUtils.lambda$attempt$7(JavaUtils.java:212)
>         at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:225)
>         at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:212)
>         at org.apache.ratis.util.FileUtils.attempt(FileUtils.java:45)
>         at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lock(RaftStorageDirectoryImpl.java:193)
>         at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.analyzeStorage(RaftStorageDirectoryImpl.java:156)
>         at 
> org.apache.ratis.server.storage.RaftStorageImpl.analyzeAndRecoverStorage(RaftStorageImpl.java:106)
>         at 
> org.apache.ratis.server.storage.RaftStorageImpl.initialize(RaftStorageImpl.java:66)
>         at 
> org.apache.ratis.server.storage.StorageImplUtils$Op.recover(StorageImplUtils.java:176)
>         at 
> org.apache.ratis.server.storage.StorageImplUtils$Op.run(StorageImplUtils.java:129)
>         at 
> org.apache.ratis.server.storage.StorageImplUtils.initRaftStorage(StorageImplUtils.java:100)
>         at 
> org.apache.ratis.server.impl.ServerState.lambda$new$2(ServerState.java:118)
>         at 
> org.apache.ratis.util.MemoizedCheckedSupplier.get(MemoizedCheckedSupplier.java:68)
>         at 
> org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:140)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:387)
>         at 
> org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:203)
>         at 
> org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:182)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>         at java.base/java.lang.Thread.run(Thread.java:840) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to