Gargi-jais11 commented on PR #9318: URL: https://github.com/apache/ozone/pull/9318#issuecomment-3649584611
@ChenSammi and @aryangupta1998 . Please take another look on the patch. Actual Reason why the `upgrade tests` where failing: _**Version: 2.0.0**_ **OM logs show:** ``` 2025-12-12 05:19:34,415 [om1-impl-thread1] INFO storage.RaftStorageDirectory: The storage directory /data/metadata/ratis/5cb24680-b9e7-3c90-a862-d66704efc61c does not exist. Creating ... 2025-12-12 05:19:34,422 [om1-impl-thread1] INFO storage.RaftStorageDirectory: Lock on /data/metadata/ratis/5cb24680-b9e7-3c90-a862-d66704efc61c/in_use.lock acquired by nodename 7@om1 ``` ✅OM uses: /data/metadata/ratis **SCM logs show:** ``` 2025-12-12 05:18:53,458 [bdd3caaa-0deb-43cc-a4ee-d222722bcb29-impl-thread1] INFO storage.RaftStorageDirectory: Lock on /data/metadata/scm-ha/8d6a99a9-6b52-4ed5-bc55-c2abbca60551/in_use.lock acquired by nodename [email protected] ``` ✅ SCM uses: /data/metadata/scm-ha _**Version: 2.2.0 Upgrade**_ **OM logs show the bug:** ``` 2025-12-12 05:22:21,680 [main] INFO server.ServerUtils: Found existing Ratis directory at old shared location: /data/metadata/ratis.2025-12-12 05:22:21,680 [main] INFO server.ServerUtils: Found existing Ratis directory at old shared location: /data/metadata/ratis. ``` Same message printed TWICE - first for OM, second for SCM check Then crashes: ``` 2025-12-12 05:22:21,681 [main] ERROR om.OzoneManagerStarter: java.io.IOException: Path of ozone.om.ratis.storage.dir and ozone.scm.ha.ratis.storage.dir should not be co located. ``` **SCM log falls back correctly:** ``` 2025-12-12 05:21:56,966 [main] INFO ha.SCMSnapshotProvider: Initializing SCM Snapshot Provider 2025-12-12 05:21:56,966 [main] WARN server.ServerUtils: Storage directory for Ratis is not configured. It is a good idea to map this to an SSD disk. Falling back to ozone.metadata.dirs 2025-12-12 05:21:56,967 [main] INFO server.ServerUtils: Found existing SCM Ratis directory at old location: /data/metadata/scm-ha. Using it for backward compatibility during upgrade. ``` **Real Issue:** Was in `OzoneManager` check for co-location which was causing it to fail although the backward compatibility worked correctly. The co-location check is looking at what directories exist locally instead of what's actually configured. In distributed setups, SCM directories won't exist on OM machines, so the check gives a false alarm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
