[
https://issues.apache.org/jira/browse/MINIFICPP-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861025#comment-16861025
]
Mr TheSegfault commented on MINIFICPP-550:
------------------------------------------
Thanks [~bakaid] for that thorough comment.
Controller services can be shared with any component. The impetus for some of
my comments was that these aren't necessarily component states, but shared
states. While we could have per processor states, we may also facilitate
inter-processor communication or controller service. Variable registry is a
trivial example of this. Another example may be storing a list of blocked IPs
that arise from one processor but are shared with subsequent processors in the
graph.
The paradigm from NiFi isn't one that I'm against. The "singleton cache" was
meant to reflect a similar idea, but one in which we don't need to augment
ProcessContext, since that coupling didn't reach all aspects of what I was
hoping we could achieve with having reporting tasks and other controller
services update the stored state. If I recall the history correctly, I believe
StateManager originated with this idea, but the coupling of StateManager to
ProcessContext was put in place since no other sensible method of retrieval
existed. This is where the singleton idea originated.
Controller services complete both aspects as we can have "known controller
service names." For example, we can have a property in the minifi.properties
file that defines a controller service name that specify a state manager. This
can be used by virtually any component. Other benefits of this include being
able to easily turn this on/off via command and control (by updating the flow).
It also allows us to more easily change the type. The negative here is that the
the config yaml file defines the state manager implementation. I rationalized
this via the concept that the config yaml file already defines not only the
graph but also the state of the agent. Retrieval can be made via process
context on getting the provider, and we can still have an option to store for
only the processor or shared amongst many. Using linked services it would also
be possible to intersect results from a volatile and a persistent repo. While
RocksDB can be configured so that the WAL is off or highly delayed, this may
allow us to do interesting things if using controller services. Finally, there
is also precedent that we can inject ( maybe through properties ) a default
controller service if non is specified. This will allow us to have it
configurable via command while still having one defined if not specified in the
update.
There are a lot of ideas, excited to see what you come up with. My comment
about configuring RocksDB to be mostly in mem doesn't ignore the fact that,
yes, there may certainly be portions of state that aren't ever going to be
persisted. In these cases those extension points could/should allow for
different state manager types, if we can use that same nomenclature. I don't
have an overly strong preference in any of these decision points, except that
I've always felt ProcessContext coupling with StateManager never felt like the
right answer since state can often be more than local, and how does one resolve
that coupling if you want state that is processor local and processor
shareable? Perhaps even that could be made more graceful. Ultimately the
process context is defined as a 'bridge between the processor and the
nifi-framework' [1]. I would take that a step further and say that it's scope
is limited to the the lifetime of the thread – which in my opinion has
implications to how we handle state.
Hopefully that all made sense. Thanks
[1]
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/processor/ProcessContext.java#L30
> Create RocksDB Controller Service
> ---------------------------------
>
> Key: MINIFICPP-550
> URL: https://issues.apache.org/jira/browse/MINIFICPP-550
> Project: Apache NiFi MiNiFi C++
> Issue Type: Bug
> Reporter: Mr TheSegfault
> Assignee: Daniel Bakai
> Priority: Major
> Fix For: 0.7.0
>
>
> A RocksDB Controller service will give us the ability to store arbitrary
> information into controller services that can later be sent via SiteToSite.
> This will support many of my monitoring and test use cases. Using RocksDB as
> a key/value store we can serialize and send this information periodically
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)