[GitHub] [bookkeeper] hangc0276 opened a new issue, #3193: BP-50: Add journal reuse feature to support Intel Pmem disk

GitBox Sat, 09 Apr 2022 01:35:12 -0700


hangc0276 opened a new issue, #3193:
URL: https://github.com/apache/bookkeeper/issues/3193

**BP**

> Follow the instructions at
http://bookkeeper.apache.org/community/bookkeeper_proposals/ to create a
proposal.

This is the master ticket for tracking BP-50 :

### Motivation
This PR https://github.com/apache/bookkeeper/pull/2742 has been introduced
the `FileChannelProvider` interface to support different file provider. The
default is `DefaultFileChannelProvider` using standard fileSystem to store
journal file.

For [Intel Optane Persistent
Memory](https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html)
disk (Pmem), it can use the Pmem disk matched library
[PMDK](https://pmem.io/pmdk/) to provide higher throughput and lower latency
than using standard fileSystem.

For Pmem disk based on PMDK library, it will pre-write the whole file first,
and then write data into the file. **That means it will write the Pmem disk
twice for each file stored on the Pmem disk**. We have tested on the Pmem
disk, and the result is as expected.

There are two options to fix this issue.
- Re-compile the linux kernel to turn off pre-write feature.
- Provide a file pool to reuse the files stored on the Pmem disk on
application side.

Consider the read/write model of BookKeeper journal disk, we recommend to
add the reuse feature for journal to support Intel Pmem disk to achieve high
throughput.

### Proposal

We provide a flag `journalReuseFiles` in conf/bk_server.conf to control
whether turn the reuse journal file feature. The flag default is `false`. The
journal file pool size is controlled by `journalMaxBackups`. When we need a log
file for writing, we first check if there are enough log files in the log file
pool, if not, we will create a new one, otherwise, we will choose an old log
file to overwrite.

There is a problem for the relationship between journal file name and the
logId. Current implementation is keep the journal file name sync with logId.
The logId will be stored in journal mark file through checkpoint.

When we enable journal file reuse feature, we should separate the journal
file name with the logId. The mapping strategy is `logFileName =
Long.toHexString(logId % journalMaxBackups)`

When we replay the journal file on bookie startup, it will get the start
journal log file according to markPosition stored in journalLastMark, and
replay the following journal files. The risk is that it will replay some old
journal files, but doesn't affect the data correctness.

Proposal PR - #

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [bookkeeper] hangc0276 opened a new issue, #3193: BP-50: Add journal reuse feature to support Intel Pmem disk

Reply via email to