[
https://issues.apache.org/jira/browse/NIP-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063034#comment-18063034
]
Joe Witt commented on NIP-22:
-----------------------------
Mark - thank you. Just hit another case where this would be extremely helpful.
A large scale deployment sourcing data from *a lot* of different source
servers. Using a legacy or unscalable protocol so there is no choice but to
ensure that at any moment only one node is doing the listing. The result of
the listing CAN be distributed in some cases but in others it would exceed the
max allowed connections to the source.
And the data rates are... significant. So a single node would be overwhelmed
simply doing the listings alone. Without a solution like this we'd have to run
a separate cluster just do perform data gathering.
Ths solution would be very helpful to support large scale clusters handling
super complex centralized data ingestion use cases.
> "Single Node Only" Execution Node
> ---------------------------------
>
> Key: NIP-22
> URL: https://issues.apache.org/jira/browse/NIP-22
> Project: NiFi Improvement Proposal
> Issue Type: New Feature
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
>
> h3. Motivation
> NiFi supports running a Processor on "All Nodes" or "Primary Node Only." This
> works well when you have a few processors that don't require a large amount
> of overhead. However, some flows have many different processors that have to
> be run as Primary Node Only, and it can cause a single node to be heavily
> overloaded.
> There are times that we want to have a Primary Node because we want multiple
> flows running on the same node. But the vast majority of the time we simply
> care that the processor runs on 1 node only - but we don't care which node.
> To this end, we should introduce the ability to schedule a Processor to run
> on a "Single Node Only" but not care which node. If we have 100 Processors
> configured for Single Node Only, each node in the cluster should run some
> number of those Processors but not all. It is worth noting that we already
> have the necessary prerequisites in place for Leader Election, as we already
> use these for Primary Node and Cluster Coordinator - this simply allows for
> more leases to be used.
> h3. NiFi API Changes
> Introduce a new ExecutionNode value of `SINGLE_NODE_ONLY`.
> Deprecate the `@PrimaryNodeOnly` in favor of a new `@SingleNodeOnly`
> Introduce a new `@OnSingleNodeElectionChange` annotation that can be added to
> processors that is analogous to the @OnPrimaryNodeChange.
> h3. REST API Changes
> There should be no significant changes to the REST API, except to allow for
> the new Execution Node value.
> h3. UI Changes
> We will need a new icon similar to the 'P' badge that shows on Processors
> when the Processor is run on Primary Node Only. This also appears in the
> Summary table. We also need the ability to choose the new value for the
> "Execution Node".
> h3. Open Questions
> Do we truly have a need for Primary Node still? Or should this replace it
> entirely?
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)