dlg99 opened a new issue, #3734:
URL: https://github.com/apache/bookkeeper/issues/3734

   **BUG REPORT**
   
   ***Describe the bug***
   
   A prod server crashed because of the segfault in the RocksDB. 
   Unfortunately, the crash dump is lost. Logs point to 
org.rocksdb.WriteBatch::delete called from 
org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers
   
   It is hard to pinpoint the issue / match it to a specific rocksDB bug 
without the crash dump. I cannot repro the problem in unit test and even if I 
repro it I won't know if that's the exact problem.
   
   So far the crash happened only one time, roughly the timing and code 
correlate with upgrade to a (internal) version (BK 4.14.x uses rocksdb 6.16.4) 
with change bringing the use of range deletion w/rocksDB 
https://github.com/apache/bookkeeper/pull/3653 
   
   After some research I have a gut feeling that the problem is related to fix 
of "a bug in iterator refresh which could segfault for DeleteRange users" 
https://github.com/facebook/rocksdb/pull/10739
   This should be included into RocksDB 7.8.0, I do not see it in 6.x versions. 
Instead i see 6.29.0 has "Added API warning against using Iterator::Refresh() 
together with DB::DeleteRange(), which are incompatible and have always risked 
causing the refreshed iterator to return incorrect results." 
   
   With that said, we have the following options:
   
   1. do nothing, hope the problem is extremely rare. 
   2. revert https://github.com/apache/bookkeeper/pull/3653 cc @hangc0276 - do 
you have any perf test results that show how much this PR improved performance 
to help decide why we may want to not revert this?
   3. upgrade RocksDB to 7.8.0+. Upgrade to 7.x as attempted at 
https://github.com/apache/bookkeeper/pull/3568 but will need more work for 
backwards compat tests (at least) assuming there is no data incompatibility. I 
see some changes around dropping some data format options that may affect 
downgrade, so there is a risk. 
   4. Upgrade to the RocksDB 6.29.5. It sounds like option 1 with extra steps 
but there are multiple fixes between 6.16.4 (or even 6.29.4.1 used by BK 4.16) 
and 6.29.5 that might reduce chances of the problem to surface, e.g.:
   
   ```
   Fixed a bug caused by race among flush, incoming writes and taking 
snapshots. Queries to snapshots created with these race condition can return 
incorrect result, e.g. resurfacing deleted data.
   Fixed a bug that DisableManualCompaction may assert when disable an 
unscheduled manual compaction.
   Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() 
performed.
   Fixed a race condition when disable and re-enable manual compaction.
   Fix a race condition when cancel manual compaction with 
DisableManualCompaction. Also DB close can cancel the manual compaction thread.
   Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads 
waiting for recovery to complete (#9496)
   Fixed a read-after-free bug in DB::GetMergeOperands().
   
   Fix a data loss bug for 2PC write-committed transaction caused by concurrent 
transaction commit and memtable switch 
   
   Fixed a major bug in which batched MultiGet could return old values for keys 
deleted by DeleteRange when memtable Bloom filter is enabled
   ```
   
   ***To Reproduce***
   
   cannot repro
   
   ***Expected behavior***
   
   no segfault
   
   ***Additional context***
   
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to