[
https://issues.apache.org/jira/browse/NIFI-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367820#comment-15367820
]
ASF GitHub Bot commented on NIFI-2078:
--------------------------------------
Github user ijokarumawak commented on the issue:
https://github.com/apache/nifi/pull/563
@JPercivall @olegz @markap14 @mcgilman
I'd like to hear your thoughts.
I've been researching how state management works in a cluster environment
and found out:
1. LOCAL scope `Set` is executed on every node from onTrigger
2. LOCAL scope `View state` result will be merged at
`ComponentStateEndpointMerger`, so each node can return its own state in a
single merged response
3. LOCAL scope `Clear state` is executed on every node, then one of the
successful results will be returned. If it has a mixed result of success and
failure, current implementation returns success result.
4. CLUSTER scope `Set` is usually done from onTrigger, such as ListXXX
processors, and we recommend to use `on primary node` schedule strategy. So
usually it's not a problem. But both DFM and developer be careful.
5. CLUSTER scope `View state` is executed on every node, even if the same
state will be returned from Zookeeper. Then `ComponentStateEndpointMerger` uses
one of those results. I think this should be executed **only on the primary
node**.
6. CLUSTER scope `Clear state` is executed on every node. Updating the same
resource in Zk. I think this should be executed **only on the primary node**.
7. EXTERNAL scope `Set` happens externally, NiFi doesn't control it.
8. EXTERNAL scope `View state` is currently executed on every node, but for
Kafka, I think it should be executed on **only on the primary node**. However,
there may be components which retrieves node dependent EXTERNAL state in the
future. In that case, differentiator something like `@OnPrimaryNodeOnly`
annotation can be added, but not required for now.
9. EXTERNAL scope `Clear state`: same as `View state`. Should be **only on
the primary node**.
While it's working, I'm not comfortable with accessing external system like
Zk or Kafka if it's unnecessary, also clearing same information from multiple
nodes sounds problematic. I'd like to modify CLUSTER scope behavior as well, in
addition to EXTERNAL scope, to limit those happen on primary node only, at
StandardNiFiServiceFacade and DAO layers.
How do you think?
> State management for processors whose states are managed externally
> -------------------------------------------------------------------
>
> Key: NIFI-2078
> URL: https://issues.apache.org/jira/browse/NIFI-2078
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Koji Kawamura
> Assignee: Koji Kawamura
> Fix For: 1.0.0
>
>
> Inherently by the nature of a given processor it may involve state managed by
> itself (using nifi state management), or can be managed by some external
> service it interacts with (kafka's offset), and theoretically some might have
> both going on. With the new state management, we're giving users a way to
> reset state managed by nifi for a given processor. But it doesnt apply to
> those processors who have external state.
> we should consider offering a way to reset state that allows a processor to
> call out to whatever external store it impacts
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)