[
https://issues.apache.org/jira/browse/MINIFICPP-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861000#comment-16861000
]
Daniel Bakai commented on MINIFICPP-550:
----------------------------------------
[~phrocker]
I have been thinking about this, and I will try to summarize my thoughts.
The one thing I am sure about is that we need key-value storage.
There is a utilization perspective: we need key-value storage for different
use cases and these use cases require different properties from the key-value
storage.
And there is an implementation perspective: we can implement a key-value
storage with different properties and using different different third-party or
homegrown implementations.
The purposes we need key-value storage for:
* storing processor states
* storing arbitrary data
** for metrics (or other information to be sent via SiteToSite)
** for variable registry
The different properties of a key-value storage our use-cases could require:
* persistence
** volatile
** persistable (stored in memory but can be persisted)
*** persisted periodically by a worker
*** persisted manually or on a trigger
** persistent
* scope
** local
** cluster (remote)
As we do not currently have plans to clusterize MiNiFi, I think we can assume
local scope for now.
For storing processor states we need a persistable storage (or a persistent if
it is performant enough).
For metrics, a volatile storage would be enough, but a persistable storage
might be advantageous, if it is performant enough.
Variable registry would have to persisted somehow.
I looked at how NiFi implements processor state storage.
It has a StateManagerProvider interface implemented by default with
StandardStateManagerProvider which is a singleton, and contains a local and a
cluster scoped StateProvider.
How these StateProviders are implemented can be defined from configuration.
The default implementation for the localStateProvider is
WriteAheadLocalStateProvider, which is a persistable key-value storage, and is
periodically persisted by a worker, and similar to what we can achieve with
RocksDB.
The StateManagerProvider can provide StateManagers using either the local or
the cluster StateProvider.
One StateManager instance is bound to a particular componentId (and how the
StateProvider manages that is left for the implementation, which is one good
idea we should follow: the implementation knows better whether it should be a
partition key, a different database, etc.).
The component (processor) can retrieve its StateManager from the
ProcessContext (which in turn gets it from the StateManagerProvider) and use to
it get and set its state.
I think something like this would be a convenient way for us to store processor
state as well. This can either be implemented as a controller service, or - as
in NiFi - a completely different architecture.
Thinking about the other use cases, those could probably be satisfied with
having a controller service which can provide key-value stores with different
properties (volatile, persistable, etc.).
You write that the requirements can be split into two parts: a controller
service and a singleton that defines the cache (and what the controller service
presumably uses).
I think that having a single cache instance implemented with something might
not be enough: if we need caches with different persistence, or we want to have
persistable caches that are persisted at different times (which we probably
do), we need to have multiple cache instances.
What do you think? Could you please elaborate on the controller service -
singleton cache architecture you mentioned?
> Create RocksDB Controller Service
> ---------------------------------
>
> Key: MINIFICPP-550
> URL: https://issues.apache.org/jira/browse/MINIFICPP-550
> Project: Apache NiFi MiNiFi C++
> Issue Type: Bug
> Reporter: Mr TheSegfault
> Assignee: Daniel Bakai
> Priority: Major
> Fix For: 0.7.0
>
>
> A RocksDB Controller service will give us the ability to store arbitrary
> information into controller services that can later be sent via SiteToSite.
> This will support many of my monitoring and test use cases. Using RocksDB as
> a key/value store we can serialize and send this information periodically
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)