arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r371309009
 
 

 ##########
 File path: extensions/rocksdb-repos/FlowFileRepository.h
 ##########
 @@ -103,6 +105,9 @@ class FlowFileRepository : public core::Repository, public 
std::enable_shared_fr
     options.create_if_missing = true;
     options.use_direct_io_for_flush_and_compaction = true;
     options.use_direct_reads = true;
+    options.write_buffer_size = 8 << 20;
 
 Review comment:
   This is the important part of this PR.
   
   When operations are done in rocksdb, it stores them in an unsorted list of 
events. The buffer getting full means that these events should be merged and 
serialized, so records are written to the disk in the regular structure of 
rocksdb. 
   
   In our case it means two things:
   -During our regular usecase the content of the buffer in continuously 
growing as creation and later deletion of the same element results in two 
events added to the buffer (log). 
   -When the buffer is full and events are merged, there is a CPU spike and the 
result (nothing or nearly nothing in our case) is written to the underlying 
storage. 
   
   As this buffer only contains flowfile metadata (attributes and content 
location), it's filled quite slowly. It takes hours in case 10 flowfiles are 
generated / sec. 
   
   To avoid memory usage going that high (FF repo can consume more than the 
rest of whole MiNiFI when the buffer is close to full) and have smaller CPU 
peaks the buffer is now emptied more frequently. This still means minutes with 
a decent load. 
   
   For more information check this: 
https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#memtable
   (Note: write amplification doesn't matter in our case as we usually persist 
negligible amount of data as flowfiles fade out of the system in a very short 
time)
   
   The logging added in this PR logs the current write buffer usage as well. It 
helps monitoring the peaks. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to