SteNicholas opened a new pull request, #2458: URL: https://github.com/apache/celeborn/pull/2458
### What changes were proposed in this pull request? `RocksDBProvider` creates non-existent multi-level directory for RocksDB initialization. ### Why are the changes needed? `RocksDBProvider` creates database if missing via `Options#setCreateIfMissing` when initializing RocksDB at present, which causes the following exception when `dbFile` is non-existent multi-level directory. ``` 2024-04-09T03:19:35.6807077Z 24/04/09 03:19:35,679 ERROR [pool-1-thread-1-ScalaTest-running-StorageManagerSuite] RocksDBProvider: error opening rocksdb file /tmp/recover/recovery.rdb. Creating new file, will not be able to recover state for existing applications 2024-04-09T03:19:35.6810066Z org.rocksdb.RocksDBException: While mkdir if missing: /tmp/recover/recovery.rdb: No such file or directory 2024-04-09T03:19:35.6811303Z at org.rocksdb.RocksDB.open(Native Method) 2024-04-09T03:19:35.6812052Z at org.rocksdb.RocksDB.open(RocksDB.java:259) 2024-04-09T03:19:35.6813431Z at org.apache.celeborn.service.deploy.worker.shuffledb.RocksDBProvider.initRockDB(RocksDBProvider.java:66) 2024-04-09T03:19:35.6815230Z at org.apache.celeborn.service.deploy.worker.shuffledb.DBProvider.initDB(DBProvider.java:39) 2024-04-09T03:19:35.6816975Z at org.apache.celeborn.service.deploy.worker.storage.StorageManager.<init>(StorageManager.scala:216) 2024-04-09T03:19:35.6818904Z at org.apache.celeborn.service.deploy.worker.storage.StorageManagerSuite.$anonfun$new$1(StorageManagerSuite.scala:30) 2024-04-09T03:19:35.6820538Z at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 2024-04-09T03:19:35.6821620Z at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) 2024-04-09T03:19:35.6822585Z at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) 2024-04-09T03:19:35.6823948Z at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) 2024-04-09T03:19:35.6824908Z at org.scalatest.Transformer.apply(Transformer.scala:22) 2024-04-09T03:19:35.6825862Z at org.scalatest.Transformer.apply(Transformer.scala:20) 2024-04-09T03:19:35.6827073Z at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) 2024-04-09T03:19:35.6828439Z at org.apache.celeborn.CelebornFunSuite.withFixture(CelebornFunSuite.scala:157) 2024-04-09T03:19:35.6829909Z at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) 2024-04-09T03:19:35.6831386Z at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) 2024-04-09T03:19:35.6832590Z at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) 2024-04-09T03:19:35.6833727Z at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) 2024-04-09T03:19:35.6835034Z at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) 2024-04-09T03:19:35.6836660Z at org.apache.celeborn.CelebornFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(CelebornFunSuite.scala:35) 2024-04-09T03:19:35.6838253Z at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) 2024-04-09T03:19:35.6839512Z at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) 2024-04-09T03:19:35.6840766Z at org.apache.celeborn.CelebornFunSuite.runTest(CelebornFunSuite.scala:35) 2024-04-09T03:19:35.6842131Z at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) 2024-04-09T03:19:35.6843459Z at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) 2024-04-09T03:19:35.6844543Z at scala.collection.immutable.List.foreach(List.scala:431) 2024-04-09T03:19:35.6845566Z at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) 2024-04-09T03:19:35.6846677Z at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) 2024-04-09T03:19:35.6847722Z at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) 2024-04-09T03:19:35.6849045Z at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) 2024-04-09T03:19:35.6850358Z at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) 2024-04-09T03:19:35.6851608Z at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) 2024-04-09T03:19:35.6852566Z at org.scalatest.Suite.run(Suite.scala:1114) 2024-04-09T03:19:35.6853295Z at org.scalatest.Suite.run$(Suite.scala:1096) 2024-04-09T03:19:35.6854857Z at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) 2024-04-09T03:19:35.6856472Z at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) 2024-04-09T03:19:35.6857654Z at org.scalatest.SuperEngine.runImpl(Engine.scala:535) 2024-04-09T03:19:35.6858737Z at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) 2024-04-09T03:19:35.6859974Z at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) 2024-04-09T03:19:35.6861519Z at org.apache.celeborn.CelebornFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(CelebornFunSuite.scala:35) 2024-04-09T03:19:35.6863041Z at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) 2024-04-09T03:19:35.6864233Z at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) 2024-04-09T03:19:35.6865355Z at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) 2024-04-09T03:19:35.6866487Z at org.apache.celeborn.CelebornFunSuite.run(CelebornFunSuite.scala:35) 2024-04-09T03:19:35.6867764Z at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) 2024-04-09T03:19:35.6869119Z at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) 2024-04-09T03:19:35.6870146Z at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414) 2024-04-09T03:19:35.6871069Z at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2024-04-09T03:19:35.6872490Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 2024-04-09T03:19:35.6873824Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 2024-04-09T03:19:35.6874805Z at java.lang.Thread.run(Thread.java:750) 2024-04-09T03:19:35.6887377Z 24/04/09 03:19:35,687 ERROR [pool-1-thread-1-ScalaTest-running-StorageManagerSuite] StorageManager: Init level DB failed: 2024-04-09T03:19:35.6889076Z java.io.IOException: Unable to create state store 2024-04-09T03:19:35.6890473Z at org.apache.celeborn.service.deploy.worker.shuffledb.RocksDBProvider.initRockDB(RocksDBProvider.java:98) 2024-04-09T03:19:35.6894605Z at org.apache.celeborn.service.deploy.worker.shuffledb.DBProvider.initDB(DBProvider.java:39) 2024-04-09T03:19:35.6904452Z at org.apache.celeborn.service.deploy.worker.storage.StorageManager.<init>(StorageManager.scala:216) 2024-04-09T03:19:35.6936013Z at org.apache.celeborn.service.deploy.worker.storage.StorageManagerSuite.$anonfun$new$1(StorageManagerSuite.scala:30) 2024-04-09T03:19:35.6937634Z at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 2024-04-09T03:19:35.6938639Z at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) 2024-04-09T03:19:35.6939493Z at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) 2024-04-09T03:19:35.6940348Z at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) 2024-04-09T03:19:35.6941200Z at org.scalatest.Transformer.apply(Transformer.scala:22) 2024-04-09T03:19:35.6942029Z at org.scalatest.Transformer.apply(Transformer.scala:20) 2024-04-09T03:19:35.6943079Z at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) 2024-04-09T03:19:35.6944350Z at org.apache.celeborn.CelebornFunSuite.withFixture(CelebornFunSuite.scala:157) 2024-04-09T03:19:35.6945683Z at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) 2024-04-09T03:19:35.6947057Z at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) 2024-04-09T03:19:35.6948181Z at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) 2024-04-09T03:19:35.6949222Z at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) 2024-04-09T03:19:35.6950415Z at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) 2024-04-09T03:19:35.6951915Z at org.apache.celeborn.CelebornFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(CelebornFunSuite.scala:35) 2024-04-09T03:19:35.6953391Z at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) 2024-04-09T03:19:35.6954811Z at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) 2024-04-09T03:19:35.6955990Z at org.apache.celeborn.CelebornFunSuite.runTest(CelebornFunSuite.scala:35) 2024-04-09T03:19:35.6957249Z at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) 2024-04-09T03:19:35.6958473Z at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) 2024-04-09T03:19:35.6959465Z at scala.collection.immutable.List.foreach(List.scala:431) 2024-04-09T03:19:35.6960422Z at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) 2024-04-09T03:19:35.6961411Z at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) 2024-04-09T03:19:35.6962347Z at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) 2024-04-09T03:19:35.6963404Z at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) 2024-04-09T03:19:35.6964635Z at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) 2024-04-09T03:19:35.6965797Z at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) 2024-04-09T03:19:35.6966679Z at org.scalatest.Suite.run(Suite.scala:1114) 2024-04-09T03:19:35.6967341Z at org.scalatest.Suite.run$(Suite.scala:1096) 2024-04-09T03:19:35.6968835Z at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) 2024-04-09T03:19:35.6970329Z at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) 2024-04-09T03:19:35.6971604Z at org.scalatest.SuperEngine.runImpl(Engine.scala:535) 2024-04-09T03:19:35.6972573Z at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) 2024-04-09T03:19:35.6973695Z at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) 2024-04-09T03:19:35.6975279Z at org.apache.celeborn.CelebornFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(CelebornFunSuite.scala:35) 2024-04-09T03:19:35.6976986Z at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) 2024-04-09T03:19:35.6978212Z at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) 2024-04-09T03:19:35.6979969Z at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) 2024-04-09T03:19:35.6982049Z at org.apache.celeborn.CelebornFunSuite.run(CelebornFunSuite.scala:35) 2024-04-09T03:19:35.6982851Z at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) 2024-04-09T03:19:35.6983608Z at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) 2024-04-09T03:19:35.6984185Z at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414) 2024-04-09T03:19:35.6984688Z at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2024-04-09T03:19:35.6985336Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 2024-04-09T03:19:35.6986127Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 2024-04-09T03:19:35.6986690Z at java.lang.Thread.run(Thread.java:750) 2024-04-09T03:19:35.6987383Z Caused by: org.rocksdb.RocksDBException: While mkdir if missing: /tmp/recover/recovery.rdb: No such file or directory 2024-04-09T03:19:35.6988110Z at org.rocksdb.RocksDB.open(Native Method) 2024-04-09T03:19:35.6988520Z at org.rocksdb.RocksDB.open(RocksDB.java:259) 2024-04-09T03:19:35.6989257Z at org.apache.celeborn.service.deploy.worker.shuffledb.RocksDBProvider.initRockDB(RocksDBProvider.java:96) 2024-04-09T03:19:35.6989929Z ... 48 more ``` Because `mkdir` does not support creating non-existent multi-level directory, `CreateDirIfMissing` does not support creation of non-existent multi-level directory in [CreateDirIfMissing](https://github.com/facebook/rocksdb/blob/main/env/fs_posix.cc#L637). Therefore `RocksDBProvider` should create non-existent multi-level directory for RocksDB initialization. ``` IOStatus CreateDirIfMissing(const std::string& name, const IOOptions& /*opts*/, IODebugContext* /*dbg*/) override { if (mkdir(name.c_str(), 0755) != 0) { if (errno != EEXIST) { return IOError("While mkdir if missing", name, errno); } else if (!DirExists(name)) { // Check that name is actually a // directory. // Message is taken from mkdir return IOStatus::IOError("`" + name + "' exists but is not a directory"); } } return IOStatus::OK(); } ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `DBProviderSuiteJ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
