achristianson opened a new pull request #603: MINIFICPP-929 mmap
URL: https://github.com/apache/nifi-minifi-cpp/pull/603
 
 
   ****--DRAFT PR please review... lots of code changes, so many eyes are 
welcome--****
   
   This PR adds a mmap() interface to allow processors to map FlowFile payloads 
to a memory address. This increases efficiency and performance significantly 
for some use cases. The change does not negatively impact performance in almost 
all cases, as shown in benchmarks.
   
   Original/full reason/justification:
   
   "Currently, MiNiFi - C++ only support stream-oriented i/o to FlowFile 
payloads. This can limit performance in cases where in-place access to the 
payload is desirable. In cases where data can be accessed randomly and 
in-place, a significant speedup can be realized by mapping the payload into 
system memory address space. This is natively supported at the kernel level in 
Linux, MacOS, and Windows via the mmap() interface on files. Other 
repositories, such as the VolatileRepository, already store the entire payload 
in memory, so it is natural to pass through this memory block as if it were a 
memory-mapped file. While the DatabaseContentRepostory does not appear to 
natively support a memory map interface, accesses via an emulated memory-map 
interface should be possible with no performance degradation with respect to a 
full read via the streaming interface.
   
   Cases where in-place, random access is beneficial include, but are not 
limited to:
   
       in-place parsing of JSON (e.g. RapidJSON supports parsing in-place, at 
least for strings).
       access of payload via protocol buffers
       random access of large files on disk, where it would otherwise require 
many seek() and read() syscalls
   
   The interface should be accessible by processors via a mmap() call on 
ProcessSession (adjacent to read() and write()). A MemoryMapCallback should be 
provided, which is called back via a process() call where the argument is an 
instance of BaseMemoryMap. The BaseMemoryMap is extended for each type of 
repository that MiNiFi - C++ supports, including: FileSystemRepository, 
VolatileRepository, and DatabaseContentRepository.
   
   As part of the change, in addition to extensive unit test coverage, 
benchmarks should be written such that the performance impact can be 
empirically measured and evaluated."
   
   Here is the full benchmark suite:
   
   ```
   
-------------------------------------------------------------------------------------------------------------------
   Benchmark                                                                    
     Time             CPU   Iterations
   
-------------------------------------------------------------------------------------------------------------------
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Tiny                
  2956 ns         2923 ns       240558
   [2019-06-debugl 14:10:44.663] 
[org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] 
org::apache::nifi::minifi::io::FileStream logger got sink
   s from namespace root and level error from namespace root
   FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Tiny                 
  4258 ns         4227 ns       164835
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Tiny           
  7764 ns         7665 ns        91078
   FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Tiny            
 14152 ns        14022 ns        49870
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Small               
 15671 ns        15631 ns        44870
   FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Small                
 21020 ns        20977 ns        33246
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Small          
 59944 ns        59772 ns        11701
   FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Small           
 57354 ns        57152 ns        12237
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_Read_Large              
3592536 ns      3587026 ns          194
   FSMemoryMapBMFixture/Callback_FileSystemRepository_Read_Large              
17014790 ns     16979026 ns           41
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_WriteRead_Large        
16578370 ns     16530633 ns           42
   FSMemoryMapBMFixture/Callback_FileSystemRepository_WriteRead_Large         
26228637 ns     26159193 ns           27
   FSMemoryMapBMFixture/MemoryMap_FileSystemRepository_RandomRead_Large         
  53.7 ns         53.7 ns     13026678
   FSMemoryMapBMFixture/Callback_FileSystemRepository_RandomRead_Large          
170905 ns       170829 ns         4074
   [2019-06-debugl 14:10:56.874] 
[org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] 
org::apache::nifi::minifi::core::Repository logger got si
   nks from namespace root and level error from namespace root
   [2019-06-debugl 14:10:56.874] 
[org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] 
org::apache::nifi::minifi::core::repository::VolatileRepo
   sitory<std::shared_ptr<org::apache::nifi::minifi::ResourceClaim> > logger 
got sinks from namespace root and level error from namespace root
   [2019-06-debugl 14:10:56.874] 
[org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] 
org::apache::nifi::minifi::core::repository::VolatileCont
   entRepository logger got sinks from namespace root and level error from 
namespace root
   [2019-06-debugl 14:10:56.874] 
[org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] 
org::apache::nifi::minifi::io::AtomicEntryMemoryMap<std::
   shared_ptr<org::apache::nifi::minifi::ResourceClaim> > () logger got sinks 
from namespace root and level error from namespace root
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Tiny            
   267 ns          267 ns      2627306
   [2019-06-debugl 14:10:57.877] 
[org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] 
org::apache::nifi::minifi::io::AtomicEntryStream<std::sha
   red_ptr<org::apache::nifi::minifi::ResourceClaim> > () logger got sinks from 
namespace root and level error from namespace root
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Tiny             
   360 ns          360 ns      1957163
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Tiny       
   558 ns          558 ns      1255342
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Tiny        
  1024 ns         1024 ns       682374
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Small           
  2654 ns         2653 ns       254700
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Small            
  7920 ns         7916 ns        96029
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Small      
  7581 ns         7578 ns       105741
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Small       
 11594 ns        11590 ns        60342
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_Read_Large          
2438303 ns      2434904 ns          286
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_Read_Large          
14859059 ns     14838872 ns           47
   VolatileMemoryMapBMFixture/MemoryMap_VolatileRepository_WriteRead_Large     
5889984 ns      5879759 ns          119
   VolatileMemoryMapBMFixture/Callback_VolatileRepository_WriteRead_Large     
17126978 ns     17105183 ns           41
   [2019-06-debugl 14:11:07.870] 
[org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] 
org::apache::nifi::minifi::core::repository::DatabaseCont
   entRepository logger got sinks from namespace root and level error from 
namespace root
   [2019-06-debugl 14:11:07.872] 
[org::apache::nifi::minifi::core::logging::LoggerConfiguration] [debug] 
org::apache::nifi::minifi::io::RocksDbStream logger got s
   inks from namespace root and level error from namespace root
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Tiny            
   285 ns          285 ns      2469053
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Tiny             
   254 ns          254 ns      2757993
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Tiny       
  4573 ns         4571 ns       150453
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Tiny        
  3553 ns         3551 ns       197460
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Small           
 12882 ns        12876 ns        54293
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Small            
 11930 ns        11925 ns        58679
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Small      
 88615 ns        88436 ns         7935
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Small       
 90748 ns        90548 ns         7717
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_Read_Large         
26695310 ns     26666793 ns           26
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_Read_Large          
26571426 ns     26544032 ns           26
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_WriteRead_Large    
49532071 ns     49459516 ns           14
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_WriteRead_Large     
62205023 ns     62085915 ns           12
   DatabaseMemoryMapBMFixture/MemoryMap_DatabaseRepository_RandomRead_Large     
  55.5 ns         55.4 ns     12612349
   DatabaseMemoryMapBMFixture/Callback_DatabaseRepository_RandomRead_Large      
   514 ns          514 ns      1094727
   ```
   
   The benchmarks show a significant performance increase in almost all cases. 
Both the FS repository and volatile can natively support memory mapping, but 
the DB repo has to simulate it by reading the full object. This has almost no 
performance impact in most cases, but is somewhat slower for the "small" (131KB 
payload) benchmark cases. The random access benchmarks show the most 
significant increase, even with the DB repo.
   
   Caveats:
   
   - No Windows build yet, although it should be possible 
(https://docs.microsoft.com/en-us/windows/desktop/memory/file-mapping). I 
mainly need a set of Windows build instructions to test and validate against, 
as there's several possible ways to do a windows build.
   - There are some formatting changes due to clang-format (I included a 
.clang-format to hopefully reduce the issue going forward)
   - It's a fairly big code change so there could be some other things I missed
   - RocksDB is updated and needed RTTI to build on my machine. We can talk 
about this and/or extract out the rocks update.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to