hangc0276 opened a new pull request #2965:
URL: https://github.com/apache/bookkeeper/pull/2965


   ### Motivation
   When we use RocksDB backend entryMetadataMap for multi ledger directories 
configured, the bookie start up failed, and throw the following exception.
   ```
   12:24:28.530 [main] ERROR org.apache.pulsar.PulsarStandaloneStarter - Failed 
to start pulsar service.
   java.io.IOException: Error open RocksDB database
           at 
org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:202)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:89)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:62)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.storage.ldb.PersistentEntryLogMetadataMap.<init>(PersistentEntryLogMetadataMap.java:87)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.createEntryLogMetadataMap(GarbageCollectorThread.java:265)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.<init>(GarbageCollectorThread.java:154)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.<init>(GarbageCollectorThread.java:133)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:182)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:190)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:150)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.bookkeeper.bookie.BookieResources.createLedgerStorage(BookieResources.java:110)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           at 
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.buildBookie(LocalBookkeeperEnsemble.java:328)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.8.1.jar:2.8.1]
           at 
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.runBookies(LocalBookkeeperEnsemble.java:391)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.8.1.jar:2.8.1]
           at 
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.startStandalone(LocalBookkeeperEnsemble.java:521)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.8.1.jar:2.8.1]
           at 
org.apache.pulsar.PulsarStandalone.start(PulsarStandalone.java:264) 
~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
           at 
org.apache.pulsar.PulsarStandaloneStarter.main(PulsarStandaloneStarter.java:121)
 [org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
   Caused by: org.rocksdb.RocksDBException: lock hold by current process, 
acquire time 1640492668 acquiring thread 123145515651072: 
data/standalone/bookkeeper00/entrylogIndexCache/metadata-cache/LOCK: No locks 
available
           at org.rocksdb.RocksDB.open(Native Method) 
~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
           at org.rocksdb.RocksDB.open(RocksDB.java:239) 
~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
           at 
org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:199)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.0-SNAPSHOT.jar:4.15.0-SNAPSHOT]
           ... 15 more
   ```
   
   The reason is multi garbageCollectionThread will open the same RocksDB and 
own the LOCK, and then throw the above exception.
   
   ### Modification
   1. Change the default GcEntryLogMetadataCachePath from 
`getLedgerDirNames()[0] + "/" + ENTRYLOG_INDEX_CACHE` to  `null`. If it is 
`null`, it will use each ledger's directory.
   2. Remove the internal directory `entrylogIndexCache`. The data structure 
looks like: 
   ```
      └── current
          ├── lastMark
          ├── ledgers
          │   ├── 000003.log
          │   ├── CURRENT
          │   ├── IDENTITY
          │   ├── LOCK
          │   ├── LOG
          │   ├── MANIFEST-000001
          │   └── OPTIONS-000005
          ├── locations
          │   ├── 000003.log
          │   ├── CURRENT
          │   ├── IDENTITY
          │   ├── LOCK
          │   ├── LOG
          │   ├── MANIFEST-000001
          │   └── OPTIONS-000005
          └── metadata-cache
              ├── 000003.log
              ├── CURRENT
              ├── IDENTITY
              ├── LOCK
              ├── LOG
              ├── MANIFEST-000001
              └── OPTIONS-000005
   ```
   3. If user configured `GcEntryLogMetadataCachePath` in `bk_server.conf`, it 
only support one ledger directory configured for `ledgerDirectories`. 
Otherwise, the best practice is to keep it default.
   4. The PR is better to release with #1949 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to