Josh McKenzie created CASSANDRA-18464:
-----------------------------------------

             Summary: Enable Direct I/O For CommitLog Files
                 Key: CASSANDRA-18464
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18464
             Project: Cassandra
          Issue Type: New Feature
          Components: Local/Commit Log
            Reporter: Josh McKenzie
            Assignee: Amit Pawar
         Attachments: UseDirectIOFeatureForCommitLogFiles.patch

Relocating from [dev@ email 
thread.|https://lists.apache.org/thread/j6ny17q2rhkp7jxvwxm69dd6v1dozjrg]

 

I shared my investigation about Commitlog I/O issue on large core count system 
in my previous email dated July-22 and link to the thread is given below.
[https://lists.apache.org/thread/xc5ocog2qz2v2gnj4xlw5hbthfqytx2n]

Basically, two solutions looked possible to improve the CommitLog I/O.
 # Multi-threaded syncing
 # Using Direct-IO through JNA

I worked on 2nd option considering the following benefit compared to the first 
one
 # Direct I/O read/write throughput is very high compared to non-Direct I/O. 
Learnt through FIO benchmarking.
 # Reduces kernel file cache uses which in-turn reduces kernel I/O activity for 
Commitlog files only.
 # Overall CPU usage reduced for flush activity. JVisualvm shows CPU usage < 
30% for Commitlog syncer thread with Direct I/O feature
 # Direct I/O implementation is easier compared to multi-threaded

As per the community suggestion, less in code complex is good to have. Direct 
I/O enablement looked promising but there was one issue. 
Java version 8 does not have native support to enable Direct I/O. So, JNA 
library usage is must. The same implementation should also work across other 
versions of Java (like 11 and beyond).

I have completed Direct I/O implementation and summary of the attached patch 
changes are given below.
 # This implementation is not using Java file channels and file is opened 
through JNA to use Direct I/O feature.
 # New Segment are defined named “DirectIOSegment”  for Direct I/O and 
“NonDirectIOSegment” for non-direct I/O (NonDirectIOSegment is test purpose 
only).
 # JNA write call is used to flush the changes.
 # New helper functions are defined in NativeLibrary.java and platform specific 
file. Currently tested on Linux only.
 # Patch allows user to configure optimum block size  and alignment if default 
values are not OK for CommitLog disk.
 # Following configuration options are provided in Cassandra.yaml file
a. use_jna_for_commitlog_io : to use jna feature
b. use_direct_io_for_commitlog : to use Direct I/O feature.
c. direct_io_minimum_block_alignment: 512 (default)
d. nvme_disk_block_size: 32MiB (default and can be changed as per the required 
size)

 Test matrix is complex so CommitLog related testcases and TPCx-IOT benchmark 
was tested. It works with both Java 8 and 11 versions. Compressed and Encrypted 
based segments are not supported yet and it can be enabled later based on the 
Community feedback.

 Following improvement are seen with Direct I/O enablement.
 # 32 cores >= ~15%
 # 64 cores >= ~80%

 Also, another observation would like to share here. Reading Commitlog files 
with Direct I/O might help in reducing node bring-up time after the node crash.

 Tested with commit ID: 91f6a9aca8d3c22a03e68aa901a0b154d960ab07

 The attached patch enables Direct I/O feature for Commitlog files. Please 
check and share your feedback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to