Andrew Christianson created MINIFICPP-929:
---------------------------------------------
Summary: Create memory map interface to flow files in
ProcessSession/ContentRepository
Key: MINIFICPP-929
URL: https://issues.apache.org/jira/browse/MINIFICPP-929
Project: Apache NiFi MiNiFi C++
Issue Type: Improvement
Reporter: Andrew Christianson
Assignee: Andrew Christianson
Currently, MiNiFi - C++ only support stream-oriented i/o to FlowFile payloads.
This can limit performance in cases where in-place access to the payload is
desirable. In cases where data can be accessed randomly and in-place, a
significant speedup can be realized by mapping the payload into system memory
address space. This is natively supported at the kernel level in Linux, MacOS,
and Windows via the mmap() interface on files. Other repositories, such as the
VolatileRepository, already store the entire payload in memory, so it is
natural to pass through this memory block as if it were a memory-mapped file.
While the DatabaseContentRepostory does not appear to natively support a memory
map interface, accesses via an emulated memory-map interface should be possible
with no performance degradation with respect to a full read via the streaming
interface.
Cases where in-place, random access is beneficial include, but are not limited
to:
* in-place parsing of JSON (e.g. RapidJSON supports parsing in-place, at least
for strings).
* access of payload via protocol buffers
* random access of large files on disk, where it would otherwise require many
seek() and read() syscalls
The interface should be accessible by processors via a mmap() call on
ProcessSession (adjacent to read() and write()). A MemoryMapCallback should be
provided, which is called back via a process() call where the argument is an
instance of BaseMemoryMap. The BaseMemoryMap is extended for each type of
repository that MiNiFi - C++ supports, including: FileSystemRepository,
VolatileRepository, and DatabaseContentRepository.
As part of the change, in addition to extensive unit test coverage, benchmarks
should be written such that the performance impact can be empirically measured
and evaluated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)