dlg99 opened a new issue, #3734: URL: https://github.com/apache/bookkeeper/issues/3734
**BUG REPORT** ***Describe the bug*** A prod server crashed because of the segfault in the RocksDB. Unfortunately, the crash dump is lost. Logs point to org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers It is hard to pinpoint the issue / match it to a specific rocksDB bug without the crash dump. I cannot repro the problem in unit test and even if I repro it I won't know if that's the exact problem. So far the crash happened only one time, roughly the timing and code correlate with upgrade to a (internal) version (BK 4.14.x uses rocksdb 6.16.4) with change bringing the use of range deletion w/rocksDB https://github.com/apache/bookkeeper/pull/3653 After some research I have a gut feeling that the problem is related to fix of "a bug in iterator refresh which could segfault for DeleteRange users" https://github.com/facebook/rocksdb/pull/10739 This should be included into RocksDB 7.8.0, I do not see it in 6.x versions. Instead i see 6.29.0 has "Added API warning against using Iterator::Refresh() together with DB::DeleteRange(), which are incompatible and have always risked causing the refreshed iterator to return incorrect results." With that said, we have the following options: 1. do nothing, hope the problem is extremely rare. 2. revert https://github.com/apache/bookkeeper/pull/3653 cc @hangc0276 - do you have any perf test results that show how much this PR improved performance to help decide why we may want to not revert this? 3. upgrade RocksDB to 7.8.0+. Upgrade to 7.x as attempted at https://github.com/apache/bookkeeper/pull/3568 but will need more work for backwards compat tests (at least) assuming there is no data incompatibility. I see some changes around dropping some data format options that may affect downgrade, so there is a risk. 4. Upgrade to the RocksDB 6.29.5. It sounds like option 1 with extra steps but there are multiple fixes between 6.16.4 (or even 6.29.4.1 used by BK 4.16) and 6.29.5 that might reduce chances of the problem to surface, e.g.: ``` Fixed a bug caused by race among flush, incoming writes and taking snapshots. Queries to snapshots created with these race condition can return incorrect result, e.g. resurfacing deleted data. Fixed a bug that DisableManualCompaction may assert when disable an unscheduled manual compaction. Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() performed. Fixed a race condition when disable and re-enable manual compaction. Fix a race condition when cancel manual compaction with DisableManualCompaction. Also DB close can cancel the manual compaction thread. Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads waiting for recovery to complete (#9496) Fixed a read-after-free bug in DB::GetMergeOperands(). Fix a data loss bug for 2PC write-committed transaction caused by concurrent transaction commit and memtable switch Fixed a major bug in which batched MultiGet could return old values for keys deleted by DeleteRange when memtable Bloom filter is enabled ``` ***To Reproduce*** cannot repro ***Expected behavior*** no segfault ***Additional context*** Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
