[ 
https://issues.apache.org/jira/browse/MINIFICPP-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861000#comment-16861000
 ] 

Daniel Bakai commented on MINIFICPP-550:
----------------------------------------

[~phrocker]

I have been thinking about this, and I will try to summarize my thoughts.
 The one thing I am sure about is that we need key-value storage.
 There is a utilization perspective: we need key-value storage for different 
use cases and these use cases require different properties from the key-value 
storage.
 And there is an implementation perspective: we can implement a key-value 
storage with different properties and using different different third-party or 
homegrown implementations.

The purposes we need key-value storage for:
 * storing processor states
 * storing arbitrary data
 ** for metrics (or other information to be sent via SiteToSite)
 ** for variable registry

The different properties of a key-value storage our use-cases could require:
 * persistence
 ** volatile
 ** persistable (stored in memory but can be persisted)
 *** persisted periodically by a worker
 *** persisted manually or on a trigger
 ** persistent
 * scope
 ** local
 ** cluster (remote)

 
 As we do not currently have plans to clusterize MiNiFi, I think we can assume 
local scope for now.
 For storing processor states we need a persistable storage (or a persistent if 
it is performant enough).
 For metrics, a volatile storage would be enough, but a persistable storage 
might be advantageous, if it is performant enough.
 Variable registry would have to persisted somehow.

I looked at how NiFi implements processor state storage.
 It has a StateManagerProvider interface implemented by default with 
StandardStateManagerProvider which is a singleton, and contains a local and a 
cluster scoped StateProvider.
 How these StateProviders are implemented can be defined from configuration. 
The default implementation for the localStateProvider is 
WriteAheadLocalStateProvider, which is a persistable key-value storage, and is 
periodically persisted by a worker, and similar to what we can achieve with 
RocksDB.
 The StateManagerProvider can provide StateManagers using either the local or 
the cluster StateProvider.
 One StateManager instance is bound to a particular componentId (and how the 
StateProvider manages that is left for the implementation, which is one good 
idea we should follow: the implementation knows better whether it should be a 
partition key, a different database, etc.).
 The component (processor) can retrieve its StateManager from the 
ProcessContext (which in turn gets it from the StateManagerProvider) and use to 
it get and set its state.

I think something like this would be a convenient way for us to store processor 
state as well. This can either be implemented as a controller service, or - as 
in NiFi - a completely different architecture.

 Thinking about the other use cases, those could probably be satisfied with 
having a controller service which can provide key-value stores with different 
properties (volatile, persistable, etc.).
 You write that the requirements can be split into two parts: a controller 
service and a singleton that defines the cache (and what the controller service 
presumably uses).
 I think that having a single cache instance implemented with something might 
not be enough: if we need caches with different persistence, or we want to have 
persistable caches that are persisted at different times (which we probably 
do), we need to have multiple cache instances.

What do you think? Could you please elaborate on the controller service - 
singleton cache architecture you mentioned?

> Create RocksDB Controller Service
> ---------------------------------
>
>                 Key: MINIFICPP-550
>                 URL: https://issues.apache.org/jira/browse/MINIFICPP-550
>             Project: Apache NiFi MiNiFi C++
>          Issue Type: Bug
>            Reporter: Mr TheSegfault
>            Assignee: Daniel Bakai
>            Priority: Major
>             Fix For: 0.7.0
>
>
> A RocksDB Controller service will give us the ability to store arbitrary 
> information into controller services that can later be sent via SiteToSite. 
> This will support many of my monitoring and test use cases. Using RocksDB as  
> a key/value store we can serialize and send this information periodically



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to