hangc0276 opened a new issue, #3193:
URL: https://github.com/apache/bookkeeper/issues/3193

   **BP**
   
   > Follow the instructions at 
http://bookkeeper.apache.org/community/bookkeeper_proposals/ to create a 
proposal.
   
   This is the master ticket for tracking BP-50 :
   
   ### Motivation
   This PR https://github.com/apache/bookkeeper/pull/2742 has been introduced 
the `FileChannelProvider` interface to support different file provider. The 
default is `DefaultFileChannelProvider` using standard fileSystem to store 
journal file.
   
   For [Intel Optane Persistent 
Memory](https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html)
 disk (Pmem), it can use the Pmem disk matched library 
[PMDK](https://pmem.io/pmdk/) to provide higher throughput and lower latency 
than using standard fileSystem.
   
   For Pmem disk based on PMDK library, it will pre-write the whole file first, 
and then write data into the file. **That means it will write the Pmem disk 
twice for each file stored on the Pmem disk**.  We have tested on the Pmem 
disk, and the result is as expected.
   
   There are two options to fix this issue. 
    - Re-compile the linux kernel to turn off pre-write feature.
    - Provide a file pool to reuse the files stored on the Pmem disk on 
application side.
   
   Consider the read/write model of BookKeeper journal disk, we recommend to 
add the reuse feature for journal to support Intel Pmem disk to achieve high 
throughput.
   
   ### Proposal
   
   We provide a flag `journalReuseFiles` in conf/bk_server.conf to control 
whether turn the reuse journal file feature. The flag default is `false`. The 
journal file pool size is controlled by `journalMaxBackups`. When we need a log 
file for writing, we first check if there are enough log files in the log file 
pool, if not, we will create a new one, otherwise, we will choose an old log 
file to overwrite.
   
   There is a problem for the relationship between journal file name and the 
logId. Current implementation is keep the journal file name sync with logId. 
The logId will be stored in journal mark file through checkpoint.
   
   When we enable journal file reuse feature, we should separate the journal 
file name with the logId.  The mapping strategy is `logFileName = 
Long.toHexString(logId % journalMaxBackups)`
   
   When we replay the journal file on bookie startup, it will get the start 
journal log file according to markPosition stored in journalLastMark, and 
replay the following journal files. The risk is that it will replay some old 
journal files, but doesn't affect the data correctness.
   
   <!-- add a proposal PR link below -->
   Proposal PR - #


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to