[GitHub] [bookkeeper] dlg99 opened a new issue, #3734: RocksDB: segfault in org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers

GitBox Wed, 11 Jan 2023 13:50:56 -0800


dlg99 opened a new issue, #3734:
URL: https://github.com/apache/bookkeeper/issues/3734

**BUG REPORT**

***Describe the bug***

A prod server crashed because of the segfault in the RocksDB.
Unfortunately, the crash dump is lost. Logs point to
org.rocksdb.WriteBatch::delete called from
org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers

It is hard to pinpoint the issue / match it to a specific rocksDB bug
without the crash dump. I cannot repro the problem in unit test and even if I
repro it I won't know if that's the exact problem.

So far the crash happened only one time, roughly the timing and code
correlate with upgrade to a (internal) version (BK 4.14.x uses rocksdb 6.16.4)
with change bringing the use of range deletion w/rocksDB
https://github.com/apache/bookkeeper/pull/3653

After some research I have a gut feeling that the problem is related to fix
of "a bug in iterator refresh which could segfault for DeleteRange users"
https://github.com/facebook/rocksdb/pull/10739
This should be included into RocksDB 7.8.0, I do not see it in 6.x versions.
Instead i see 6.29.0 has "Added API warning against using Iterator::Refresh()
together with DB::DeleteRange(), which are incompatible and have always risked
causing the refreshed iterator to return incorrect results."

With that said, we have the following options:

1. do nothing, hope the problem is extremely rare.
2. revert https://github.com/apache/bookkeeper/pull/3653 cc @hangc0276 - do
you have any perf test results that show how much this PR improved performance
to help decide why we may want to not revert this?
3. upgrade RocksDB to 7.8.0+. Upgrade to 7.x as attempted at
https://github.com/apache/bookkeeper/pull/3568 but will need more work for
backwards compat tests (at least) assuming there is no data incompatibility. I
see some changes around dropping some data format options that may affect
downgrade, so there is a risk.
4. Upgrade to the RocksDB 6.29.5. It sounds like option 1 with extra steps
but there are multiple fixes between 6.16.4 (or even 6.29.4.1 used by BK 4.16)
and 6.29.5 that might reduce chances of the problem to surface, e.g.:

```
Fixed a bug caused by race among flush, incoming writes and taking
snapshots. Queries to snapshots created with these race condition can return
incorrect result, e.g. resurfacing deleted data.
Fixed a bug that DisableManualCompaction may assert when disable an
unscheduled manual compaction.
Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange()
performed.
Fixed a race condition when disable and re-enable manual compaction.
Fix a race condition when cancel manual compaction with
DisableManualCompaction. Also DB close can cancel the manual compaction thread.
Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads
waiting for recovery to complete (#9496)
Fixed a read-after-free bug in DB::GetMergeOperands().

Fix a data loss bug for 2PC write-committed transaction caused by concurrent
transaction commit and memtable switch

Fixed a major bug in which batched MultiGet could return old values for keys
deleted by DeleteRange when memtable Bloom filter is enabled
```

***To Reproduce***

cannot repro

***Expected behavior***

no segfault

***Additional context***

Add any other context about the problem here.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [bookkeeper] dlg99 opened a new issue, #3734: RocksDB: segfault in org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers

Reply via email to