[ 
https://issues.apache.org/jira/browse/MINIFICPP-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861025#comment-16861025
 ] 

Mr TheSegfault commented on MINIFICPP-550:
------------------------------------------

Thanks [~bakaid] for that thorough comment.

 

Controller services can be shared with any component. The impetus for some of 
my comments was that these aren't necessarily component states, but shared 
states. While we could have per processor states, we may also facilitate 
inter-processor communication or controller service. Variable registry is a 
trivial example of this. Another example may be storing a list of blocked IPs 
that arise from one processor but are shared with subsequent processors in the 
graph.

The paradigm from NiFi isn't one that I'm against. The "singleton cache" was 
meant to reflect a similar idea, but one in which we don't need to augment 
ProcessContext, since that coupling didn't reach all aspects of what I was 
hoping we could achieve with having reporting tasks and other controller 
services update the stored state. If I recall the history correctly, I believe 
StateManager originated with this idea, but the coupling of StateManager to 
ProcessContext was put in place since no other sensible method of retrieval 
existed. This is where the singleton idea originated.


Controller services complete both aspects as we can have "known controller 
service names." For example, we can have a property in the minifi.properties 
file that defines a controller service name that specify a state manager. This 
can be used by virtually any component. Other benefits of this include being 
able to easily turn this on/off via command and control (by updating the flow). 
It also allows us to more easily change the type. The negative here is that the 
the config yaml file defines the state manager implementation. I rationalized 
this via the concept that the config yaml file already defines not only the 
graph but also the state of the agent. Retrieval can be made via process 
context on getting the provider, and we can still have an option to store for 
only the processor or shared amongst many.  Using linked services it would also 
be possible to intersect results from a volatile and a persistent repo. While 
RocksDB can be configured so that the WAL is off or highly delayed, this may 
allow us to do interesting things if using controller services. Finally, there 
is also precedent that we can inject ( maybe through properties ) a default 
controller service if non is specified. This will allow us to have it 
configurable via command while still having one defined if not specified in the 
update.

 

There are a lot of ideas, excited to see what you come up with. My comment 
about configuring RocksDB to be mostly in mem doesn't ignore the fact that, 
yes, there may certainly be portions of state that aren't ever going to be 
persisted. In these cases those extension points could/should allow for 
different state manager types, if we can use that same nomenclature. I don't 
have an overly strong preference in any of these decision points, except that 
I've always felt ProcessContext coupling with StateManager never felt like the 
right answer since state can often be more than local, and how does one resolve 
that coupling if you want state that is processor local and processor 
shareable? Perhaps even that could be made more graceful. Ultimately the 
process context is defined as a 'bridge between the processor and the 
nifi-framework' [1]. I would take that a step further and say that it's scope 
is limited to the the lifetime of the thread – which in my opinion has 
implications to how we handle state. 



Hopefully that all made sense. Thanks

[1] 
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/processor/ProcessContext.java#L30

> Create RocksDB Controller Service
> ---------------------------------
>
>                 Key: MINIFICPP-550
>                 URL: https://issues.apache.org/jira/browse/MINIFICPP-550
>             Project: Apache NiFi MiNiFi C++
>          Issue Type: Bug
>            Reporter: Mr TheSegfault
>            Assignee: Daniel Bakai
>            Priority: Major
>             Fix For: 0.7.0
>
>
> A RocksDB Controller service will give us the ability to store arbitrary 
> information into controller services that can later be sent via SiteToSite. 
> This will support many of my monitoring and test use cases. Using RocksDB as  
> a key/value store we can serialize and send this information periodically



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to