[ 
https://issues.apache.org/jira/browse/HDDS-13866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038007#comment-18038007
 ] 

Gargi Jaiswal commented on HDDS-13866:
--------------------------------------

[~Sammi] , I checked for RocksDB configuration as well. It seems to be *no file 
locking conflict* there since *om, scm,recon & datanode* database directories 
are inherently named with a component-specific prefix as below:
{code:java}
/data/metadata/
├── om.db/          ← OM RocksDB (different name)
├── scm.db/         ← SCM RocksDB (different name)  
├── recon.db/       ← Recon RocksDB (different name)
└── ratis/          ← ALL components tried to use THIS! ❌ CONFLICT! {code}
But for *ratis* we need to handle the case as below:
{code:java}
Before fix:
/data/metadata/ratis/ ← SCM, OM, DataNode ALL tried to use this!❌ 

After fix:
/data/metadata/scm/ratis/ ← SCM only ✅ 
/data/metadata/om/ratis/ ← OM only ✅ 
/data/metadata/datanode/ratis/ ← DataNode only ✅{code}

> Ozone datanode startup exception on colocated hosts locked the storage 
> directory:./ratis/
> -----------------------------------------------------------------------------------------
>
>                 Key: HDDS-13866
>                 URL: https://issues.apache.org/jira/browse/HDDS-13866
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone CLI
>    Affects Versions: 2.0.0
>            Reporter: Soumitra Sulav
>            Assignee: Gargi Jaiswal
>            Priority: Major
>              Labels: installer
>
>  
> Common properties set on a colocated host for SCM/OM HA and datanodes.
> {code:java}
> <property>
>     <name>hdds.datanode.dir</name>
>     <value>DATA_BASE/data/dn</value>  </property>  
> <property>
>     <name>dfs.container.ratis.datanode.storage.dir</name>
>     <value>DATA_BASE/data/ratis</value>  </property>  
> <property>
>     <name>ozone.metadata.dirs</name>
>     <value>DATA_BASE/meta/data</value>  </property>  
> <property>
>     <name>ozone.scm.datanode.id.dir</name>
>     <value>DATA_BASE/meta/scm</value>  </property>  
> <property>
>     <name>ozone.om.ratis.snapshot.dir</name>
>     <value>DATA_BASE/meta/ratis</value>  
> </property>  
> <property>
>     <name>ozone.om.db.dirs</name>
>     <value>DATA_BASE/meta/om</value>
> </property>{code}
> Error seen in datanode logs
> {code:java}
> 2025-11-01 12:33:10,363 [main] INFO 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet: Added 
> Volume : /data/ozone/data/dn/hdds to VolumeSet
> 2025-11-01 12:33:10,366 [main] WARN 
> org.apache.hadoop.hdds.server.ServerUtils: Storage directory for Ratis is not 
> configured. It is a good idea to map this to an SSD disk. Falling back to 
> ozone.metadata.dirs
> 2025-11-01 12:33:10,372 [main] INFO 
> org.apache.hadoop.hdds.fs.SaveSpaceUsageToFile: Cached usage info found in 
> /data/ozone/meta/data/ratis/scmUsed: 4231168 at 2025-11-01T12:33:06.540Z{code}
>  
> {code:java}
> 2025-11-01 12:33:15,551 [a3f6b64c-0d96-4997-aaa4-ecf78411ab1c-impl-thread1] 
> ERROR org.apache.ratis.server.storage.RaftStorageDirectory: It appears that 
> another process has already locked the storage directory: 
> /data/ozone/meta/data/ratis/b4d30077-850e-3900-91cf-d0d586af6951
> java.nio.channels.OverlappingFileLockException
>         at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.tryLock(RaftStorageDirectoryImpl.java:226)
>         at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lambda$lock$0(RaftStorageDirectoryImpl.java:193)
>         at 
> org.apache.ratis.util.JavaUtils.lambda$attempt$7(JavaUtils.java:212)
>         at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:225)
>         at org.apache.ratis.util.JavaUtils.attempt(JavaUtils.java:212)
>         at org.apache.ratis.util.FileUtils.attempt(FileUtils.java:45)
>         at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.lock(RaftStorageDirectoryImpl.java:193)
>         at 
> org.apache.ratis.server.storage.RaftStorageDirectoryImpl.analyzeStorage(RaftStorageDirectoryImpl.java:156)
>         at 
> org.apache.ratis.server.storage.RaftStorageImpl.analyzeAndRecoverStorage(RaftStorageImpl.java:106)
>         at 
> org.apache.ratis.server.storage.RaftStorageImpl.initialize(RaftStorageImpl.java:66)
>         at 
> org.apache.ratis.server.storage.StorageImplUtils$Op.recover(StorageImplUtils.java:176)
>         at 
> org.apache.ratis.server.storage.StorageImplUtils$Op.run(StorageImplUtils.java:129)
>         at 
> org.apache.ratis.server.storage.StorageImplUtils.initRaftStorage(StorageImplUtils.java:100)
>         at 
> org.apache.ratis.server.impl.ServerState.lambda$new$2(ServerState.java:118)
>         at 
> org.apache.ratis.util.MemoizedCheckedSupplier.get(MemoizedCheckedSupplier.java:68)
>         at 
> org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:140)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:387)
>         at 
> org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:203)
>         at 
> org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:182)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>         at java.base/java.lang.Thread.run(Thread.java:840) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to