[
https://issues.apache.org/jira/browse/NIP-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kevin Doran updated NIP-14:
---------------------------
Description:
*Motivation*
Increasingly, NiFi clusters are being deployed into environments where network
traffic is managed through a non-NiFi component, such as a gateway, ingress
controller, load balancer, or reverse proxy. These include cloud and
containerized deployment environments. NiFi infrastructure is sometimes managed
via infrastructure as code (IaC) frameworks as part of a larger system.
Additionally, many NiFi operators desire to deploy flow definitions
programmatically via automated deployment pipelines. Flow versioning and
promotion of flow definitions from non-production to production NiFi clusters
has become a best practice in the community.
The combination of these factors leads to situations where it would be
advantageous if data ingress ports defined as part of a flow definition (eg, a
ListenHTTP processor) could be dynamically discovered by infrastructure and
deployment software in order to automatically configure managed networking
components responsible for managing ingress traffic to NiFi clusters.
*Scope and Description*
There are currently 9 processors and 1 controller service included in the NiFi
source code distribution that define Listen Ports:
- HandleHTTPRequest
- ListenFTP
- ListenHTTP
- ListenOTLP
- ListenSyslog
- ListenTCP
- ListenTrapSNMP
- ListenUDP
- ListenUDPRecord
- JettyWebSocketServer (CS)
(Note, that ListenSlack uses the Listen* naming convention, but this processor
actually initiates a two-way TCP connection with a Slack workspace in order to
start receiving events. It does not create a server process to accept inbound
connections.)
Of course, it is possible that community members have (or could) develop custom
NiFi extensions that create Listen Ports. Apache NiFi components usually serve
as example, reference implementations in these cases. The goal of this feature
is to introduce standard interfaces for declaring data ingress ports in
components as well as framework mechanisms for a standard discovery process of
such ports, making it possible to support Apache NiFi provided components as
well as external, third-party extensions.
All of the example components listed above are ConfigurableComponents that
define a Port property the establishes the numbered port that is bound to in
order to listen to payloads and connections from external clients. A new
Property Descriptor field in nifi-api would be a natural place to annotate that
a property defines a Listen Port. This would allow discovery of Listen Ports
both at runtime by the framework, as well as in flow definitions, which is very
advantageous as it allows determining network ingress requirements based solely
on the static flow definition before it has even been deployed into a NiFi
runtime environment.
In addition to enhancement to Property Descriptors in NiFi API described above,
the NiFi Framework and NiFi REST API would be modified to dynamically discover
configurable components (ie, processors and controller services) containing
Listen Ports and list those components along with their current configuration
programmatically.
When looking at how a Listen Port should be defined, the most important aspect
is the layer 4 transport protocol, as that is usually the most relevant
information required to automatically configure external networking components
such as gateways, ingress controllers, load balancers, and proxies. Of
secondary importance is the layer 7 application protocol, if any. Looking at
existing projects that have solved similar problems, we can take Kubernetes as
an example. Kubernetes allows services, which can be arbitrary containers
running any process, to declare exposed ports. For service ports, the
Kubernetes network data model just allows for declaring port number and
[transport
protocol|https://kubernetes.io/docs/reference/networking/service-protocols/]
using well-defined enum values. Optionally, [application
protocols|https://kubernetes.io/docs/concepts/services-networking/service/#application-protocol]
can be provided as a hint, use freeform strings, and when available can be
used by the provider for richer support of the app protocol. This data model
serves as a good guide for NiFi to model a similar situation, just replacing
services with NiFi extensions.
A proposed, draft update to the NiFi API to introduce the concept of Listen
Ports is available here:
[https://github.com/kevdoran/nifi-api/tree/listen-ports]
*Compatibility*
Largely, this is a backwards compatible change.
The proposed approach is entirely additive and optional: Once implemented and
released, extension components can opt-in to declaring Listen Ports that they
create. The burden to do so is minimal for component authors; in most cases, a
few lines of code. Once done, flow registries and operational tools built atop
them can dynamically discover flows that require data ingress rules, and the
NiFi framework can dynamically discover components that provide data ingress
ports and their port configuration to make it available via the REST API for
external components. These offer incentives for component authors to opt-in to
this feature, without any breaking changes that forces them to update their
components to continue working on newer NiFi versions.
There is one minor breaking change to the properties for the ListenSyslog
processor. Syslog is an application protocol that can work over TCP or UDP. The
current ListenSyslog processor allows specifying the port to listen on and a
second property (Protocol) for specifying the transport protocol to accept (TCP
or UDP). The proposed design for the NiFi API Property Descriptor requires
declaring a static L4 transport protocol when defining a Listen Port. This
allows knowing the transport protocol based on a flow definition without
knowing what the runtime configuration will be, which greatly simplifies rules
NiFi operators may want to implement: for example, an operator might have a
policy to block all inbound UDP traffic for security reasons and could inspect
flow definitions to determine compatibility problems with a target NiFi runtime
environment.
The proposed solution for ListenSyslog is replacing the existing Port and
Protocol properties with TCP Port and UDP Port properties that are mutually
exclusive (only one is allowed to be configured at a time). The migrate
properties feature that was introduced for in-place flow version changes will
allow us to migrate flows from the old configuration to the new configuration
automatically for users, and migration guidance for the first NiFi release to
include the modified processor can cover the remaining cases.
*Verification*
The Verification process for this feature would include:
- Unit tests for components that define Listen Ports to make sure that they
implement the new interfaces, and then when configured, the correct Listen Port
definition is discoverable (ie, the return value for new interfaces matches the
expected values based on the component configuration)
- Integration tests that verify when multiple Listen components exist in a
NiFi Cluster, the Listen Ports they create are correctly discoverable via the
NiFi REST API.
- Instructions for peer-reviewers to manually verify the feature
implementation.
- Documentation updates, primarily to the NiFi Developer Guide, to add
instructions and guidance for implementing new listen components in a manner
that is compatible with this new feature.
*Alternatives*
The following alternatives were considered:
# {_}No changes to NiFi{_}; instead put the responsibility of discovering
Listen Ports solely on external components, such as deployment scripts and
infrastructure management logic. For example, external code could just "know"
(ie, using hardcoded logic) that ListenHTTP defines a data ingress port via a
property that accepts HTTP requests and look for instances of that known
Processor type. Alternatively, we could go with a convention-based approach
such as "processors that have a type name starting with Listen* and a Port
property." Both of these are very brittle, lack discoverability by extension
component authors, and do not account for the large community of NiFi
developers that may use different conventions than those used for Apache NiFi
components. For these reasons, this alternative was deemed insufficient.
# {_}A larger change that also tries to unify the new feature port discovery
to include the various framework-level ingress ports{_}, such as the remote
input port used for site-to-site protocol or cluster communication ports. For
example, maybe a new framework-level concept of a NiFi Gateway allows operators
and flow authors to put all network ingress rules in one place. While this may
have some advantages, and could be considered in the future, it was ultimately
deemed too large a change at this time. It likely would include breaking
changes to configuration files and and APIs, and therefore would be more
reasonable to implement as a NiFi 3.0 / major version change, should the need
ever arise. Additionally, framework-level ingress ports are defined via a
different process (usually in nifi.properties) and once set do not change
often; therefore, they are already much more management by something like IaC
logic and less of a problem compared to ports that are part of flow
definitions. This means that including them in the scope of this feature offers
much less value despite greatly increasing the scope.
The proposed scope and description offers the best benefit for minimal effort
with practically no breaking changes.
was:
*Motivation*
Increasingly, NiFi clusters are being deployed into environments where network
traffic is managed through a non-NiFi component, such as a gateway, ingress
controller, load balancer, or reverse proxy. These include cloud and
containerized deployment environments. NiFi infrastructure is sometimes managed
via infrastructure as code (IaC) frameworks as part of a larger system.
Additionally, many NiFi operators desire to deploy flow definitions
programmatically via automated deployment pipelines. Flow versioning and
promotion of flow definitions from non-production to production NiFi clusters
has become a best practice in the community.
The combination of these factors leads to situations where it would be
advantageous if data ingress ports defined as part of a flow definition (eg, a
ListenHTTP processor) could be dynamically discovered by infrastructure and
deployment software in order to automatically configure managed networking
components responsible for managing ingress traffic to NiFi clusters.
*Scope and Description*
There are currently 9 processors and 1 controller service included in the NiFi
source code distribution that define Listen Ports:
- HandleHTTPRequest
- ListenFTP
- ListenHTTP
- ListenOTLP
- ListenSyslog
- ListenTCP
- ListenTrapSNMP
- ListenUDP
- ListenUDPRecord
- JettyWebSocketServer (CS)
(Note, that ListenSlack uses the Listen* naming convention, but this processor
actually initiates a two-way TCP connection with a Slack workspace in order to
start receiving events. It does not create a server process to accept inbound
connections.)
Of course, it is possible that community members have (or could) develop custom
NiFi extensions that create Listen Ports. Apache NiFi components usually serve
as example, reference implementations in these cases. The goal of this feature
is to introduce standard interfaces for declaring data ingress ports in
components as well as framework mechanisms for a standard discovery process of
such ports, making it possible to support Apache NiFi provided components as
well as external, third-party extensions.
All of the example components listed above are ConfigurableComponents that
define a Port property the establishes the numbered port that is bound to in
order to listen to payloads and connections from external clients. A new
Property Descriptor field in nifi-api would be a natural place to annotate that
a property defines a Listen Port. This would allow discovery of Listen Ports
both at runtime by the framework, as well as in flow definitions, which is very
advantageous as it allows determining network ingress requirements based solely
on the static flow definition before it has even been deployed into a NiFi
runtime environment.
In addition to additions to Property Descriptors in NiFi API described above,
the NiFi Framework and NiFi REST API would be modified to dynamically discover
configurable components (ie, processors and controller services) containing
Listen Ports and list those components along with their current configuration
programmatically.
When looking at how a Listen Port should be defined, the most important aspect
is the layer 4 transport protocol, as that is usually the most relevant
information required to automatically configure external networking components
such as gateways, ingress controllers, load balancers, and proxies. Of
secondary importance is the layer 7 application protocol, if any. Looking at
existing projects that have solved similar problems, we can take Kubernetes as
an example. Kubernetes allows services, which can be arbitrary containers
running any process, to declare exports ports. For service ports, the
Kubernetes network data model just allows for declaring port number and
[transport
protocol|https://kubernetes.io/docs/reference/networking/service-protocols/]
using well-defined enum values. Optionally, [application
protocols|https://kubernetes.io/docs/concepts/services-networking/service/#application-protocol]
can be provided as a hint, use freeform strings, and when available can be
used by the provider for richer support of the app protocol. This data model
serves as a good guide for NiFi to model a similar situation, just replacing
services with NiFi extensions.
A proposed, draft update to the NiFi API to introduce to concept of Listen
Ports is available here:
[https://github.com/kevdoran/nifi-api/tree/listen-ports]
*Compatibility*
Largely, this is a backwards compatible change.
The proposed approach is entirely additive and optional: Once implemented and
released, extension components can opt-in to declaring Listen Ports that they
create. The burden to do so is minimal for component authors; in most cases, a
few lines of code. Once done, flow registries and operational tools built atop
them can dynamically discover flows that require data ingress rules, and the
NiFi framework can dynamically discover components that provide data ingress
ports and their port configuration to make it available via the REST API for
external components. These offer incentives for component authors to opt-in to
this feature, without any breaking changes that forces them to update their
components to continue working on newer NiFi versions.
There is one minor breaking change to the properties for the ListenSyslog
processor. Syslog is an application protocol that can work over TCP or UDP. The
current ListenSyslog processor allows specifying the port to listen on as a
second property for specifying the transport protocol to accept (TCP or UDP).
The proposed design for the NiFi API Property Descriptor would declaring a
static transport protocol associated with a Listen Port. This allows knowing
the transport protocol based on a flow definition without knowing what the
runtime configuration will be, which greatly simplifies rules NiFi operators
may want to codify such as if a flow definition is compatible with a target
NiFi Runtime (eg, an operator may by policy block all inbound UDP traffic for
security reasons.)
The proposed solution to this is replacing the ListenSyslog processor Port and
Protocol properties with TCP Port and UDP Port properties that are mutually
exclusive (only one is allowed to be configured at a time). The migrate
properties feature that was introduced for in-place flow version changes will
allow us to migrate flows from the old configuration to the new configuration
automatically for users, and migration guidance for the first NiFi release to
include the modified processor can cover the remaining cases.
*Verification*
The Verification process for this feature would include:
- Unit tests for components that define Listen Ports to make sure that they
implement the new interfaces, and then when configured, the correct Listen Port
definition is discoverable (ie, the return value for new interfaces matches the
expected values based on the component configuration)
- Integration tests that verify when multiple Listen components exist in a
NiFi Cluster, the Listen Ports they create are correctly discoverable via the
NiFi REST API.
- Instructions for peer-reviewers to manually verify the feature
implementation.
- Documentation updates, primarily to the NiFi Developer Guide, to add
instructions and guidance for implementing new listen components in a manner
that is compatible with this new feature.
*Alternatives*
The following alternatives were considered:
# {_}No changes to NiFi{_}; instead put the responsibility of discovering
Listen Ports solely on external components, such as deployment scripts and
infrastructure management logic. For example, external code could just "know"
(ie, using hardcoded logic) that ListenHTTP defines a data ingress port via a
property that accepts HTTP requests and look for instances of that known
Processor type. Alternatively, we could go with a convention-based approach
such as "processors that have a type name starting with Listen* and a Port
property." Both of these are very brittle, lack discoverability by extension
component authors, and do not account for the large community of NiFi
developers that may use different conventions than those used for Apache NiFi
components. For these reasons, this alternative was deemed insufficient.
# {_}A larger change that also tries to unify the new feature port discovery
to include the various framework-level ingress ports{_}, such as the remote
input port used for site-to-site protocol or cluster communication ports. For
example, maybe a new framework-level concept of a NiFi Gateway allows operators
and flow authors to put all network ingress rules in one place. While this may
have some advantages, and could be considered in the future, it was ultimately
deemed too large a change at this time. It likely would include breaking
changes to configuration files and and APIs, and therefore would be more
reasonable to implement as a NiFi 3.0 / major version change, should the need
ever arise. Additionally, framework-level ingress ports are defined via a
different process (usually in nifi.properties) and once set do not change
often; therefore, they are already much more management by something like IaC
logic and less of a problem compared to ports that are part of flow
definitions. This means that including them in the scope of this feature offers
much less value despite greatly increasing the scope.
The proposed scope and description offers the best benefit for minimal effort
with practically no breaking changes.
> Support Dynamic Discovery of Data Ingress Ports
> -----------------------------------------------
>
> Key: NIP-14
> URL: https://issues.apache.org/jira/browse/NIP-14
> Project: NiFi Improvement Proposal
> Issue Type: Improvement
> Reporter: Kevin Doran
> Assignee: Kevin Doran
> Priority: High
>
> *Motivation*
> Increasingly, NiFi clusters are being deployed into environments where
> network traffic is managed through a non-NiFi component, such as a gateway,
> ingress controller, load balancer, or reverse proxy. These include cloud and
> containerized deployment environments. NiFi infrastructure is sometimes
> managed via infrastructure as code (IaC) frameworks as part of a larger
> system.
> Additionally, many NiFi operators desire to deploy flow definitions
> programmatically via automated deployment pipelines. Flow versioning and
> promotion of flow definitions from non-production to production NiFi clusters
> has become a best practice in the community.
> The combination of these factors leads to situations where it would be
> advantageous if data ingress ports defined as part of a flow definition (eg,
> a ListenHTTP processor) could be dynamically discovered by infrastructure and
> deployment software in order to automatically configure managed networking
> components responsible for managing ingress traffic to NiFi clusters.
> *Scope and Description*
> There are currently 9 processors and 1 controller service included in the
> NiFi source code distribution that define Listen Ports:
> - HandleHTTPRequest
> - ListenFTP
> - ListenHTTP
> - ListenOTLP
> - ListenSyslog
> - ListenTCP
> - ListenTrapSNMP
> - ListenUDP
> - ListenUDPRecord
> - JettyWebSocketServer (CS)
> (Note, that ListenSlack uses the Listen* naming convention, but this
> processor actually initiates a two-way TCP connection with a Slack workspace
> in order to start receiving events. It does not create a server process to
> accept inbound connections.)
> Of course, it is possible that community members have (or could) develop
> custom NiFi extensions that create Listen Ports. Apache NiFi components
> usually serve as example, reference implementations in these cases. The goal
> of this feature is to introduce standard interfaces for declaring data
> ingress ports in components as well as framework mechanisms for a standard
> discovery process of such ports, making it possible to support Apache NiFi
> provided components as well as external, third-party extensions.
> All of the example components listed above are ConfigurableComponents that
> define a Port property the establishes the numbered port that is bound to in
> order to listen to payloads and connections from external clients. A new
> Property Descriptor field in nifi-api would be a natural place to annotate
> that a property defines a Listen Port. This would allow discovery of Listen
> Ports both at runtime by the framework, as well as in flow definitions, which
> is very advantageous as it allows determining network ingress requirements
> based solely on the static flow definition before it has even been deployed
> into a NiFi runtime environment.
> In addition to enhancement to Property Descriptors in NiFi API described
> above, the NiFi Framework and NiFi REST API would be modified to dynamically
> discover configurable components (ie, processors and controller services)
> containing Listen Ports and list those components along with their current
> configuration programmatically.
> When looking at how a Listen Port should be defined, the most important
> aspect is the layer 4 transport protocol, as that is usually the most
> relevant information required to automatically configure external networking
> components such as gateways, ingress controllers, load balancers, and
> proxies. Of secondary importance is the layer 7 application protocol, if any.
> Looking at existing projects that have solved similar problems, we can take
> Kubernetes as an example. Kubernetes allows services, which can be arbitrary
> containers running any process, to declare exposed ports. For service ports,
> the Kubernetes network data model just allows for declaring port number and
> [transport
> protocol|https://kubernetes.io/docs/reference/networking/service-protocols/]
> using well-defined enum values. Optionally, [application
> protocols|https://kubernetes.io/docs/concepts/services-networking/service/#application-protocol]
> can be provided as a hint, use freeform strings, and when available can be
> used by the provider for richer support of the app protocol. This data model
> serves as a good guide for NiFi to model a similar situation, just replacing
> services with NiFi extensions.
> A proposed, draft update to the NiFi API to introduce the concept of Listen
> Ports is available here:
> [https://github.com/kevdoran/nifi-api/tree/listen-ports]
> *Compatibility*
> Largely, this is a backwards compatible change.
> The proposed approach is entirely additive and optional: Once implemented and
> released, extension components can opt-in to declaring Listen Ports that they
> create. The burden to do so is minimal for component authors; in most cases,
> a few lines of code. Once done, flow registries and operational tools built
> atop them can dynamically discover flows that require data ingress rules, and
> the NiFi framework can dynamically discover components that provide data
> ingress ports and their port configuration to make it available via the REST
> API for external components. These offer incentives for component authors to
> opt-in to this feature, without any breaking changes that forces them to
> update their components to continue working on newer NiFi versions.
> There is one minor breaking change to the properties for the ListenSyslog
> processor. Syslog is an application protocol that can work over TCP or UDP.
> The current ListenSyslog processor allows specifying the port to listen on
> and a second property (Protocol) for specifying the transport protocol to
> accept (TCP or UDP). The proposed design for the NiFi API Property Descriptor
> requires declaring a static L4 transport protocol when defining a Listen
> Port. This allows knowing the transport protocol based on a flow definition
> without knowing what the runtime configuration will be, which greatly
> simplifies rules NiFi operators may want to implement: for example, an
> operator might have a policy to block all inbound UDP traffic for security
> reasons and could inspect flow definitions to determine compatibility
> problems with a target NiFi runtime environment.
> The proposed solution for ListenSyslog is replacing the existing Port and
> Protocol properties with TCP Port and UDP Port properties that are mutually
> exclusive (only one is allowed to be configured at a time). The migrate
> properties feature that was introduced for in-place flow version changes will
> allow us to migrate flows from the old configuration to the new configuration
> automatically for users, and migration guidance for the first NiFi release to
> include the modified processor can cover the remaining cases.
> *Verification*
> The Verification process for this feature would include:
> - Unit tests for components that define Listen Ports to make sure that they
> implement the new interfaces, and then when configured, the correct Listen
> Port definition is discoverable (ie, the return value for new interfaces
> matches the expected values based on the component configuration)
> - Integration tests that verify when multiple Listen components exist in a
> NiFi Cluster, the Listen Ports they create are correctly discoverable via the
> NiFi REST API.
> - Instructions for peer-reviewers to manually verify the feature
> implementation.
> - Documentation updates, primarily to the NiFi Developer Guide, to add
> instructions and guidance for implementing new listen components in a manner
> that is compatible with this new feature.
> *Alternatives*
> The following alternatives were considered:
> # {_}No changes to NiFi{_}; instead put the responsibility of discovering
> Listen Ports solely on external components, such as deployment scripts and
> infrastructure management logic. For example, external code could just "know"
> (ie, using hardcoded logic) that ListenHTTP defines a data ingress port via a
> property that accepts HTTP requests and look for instances of that known
> Processor type. Alternatively, we could go with a convention-based approach
> such as "processors that have a type name starting with Listen* and a Port
> property." Both of these are very brittle, lack discoverability by extension
> component authors, and do not account for the large community of NiFi
> developers that may use different conventions than those used for Apache NiFi
> components. For these reasons, this alternative was deemed insufficient.
> # {_}A larger change that also tries to unify the new feature port discovery
> to include the various framework-level ingress ports{_}, such as the remote
> input port used for site-to-site protocol or cluster communication ports. For
> example, maybe a new framework-level concept of a NiFi Gateway allows
> operators and flow authors to put all network ingress rules in one place.
> While this may have some advantages, and could be considered in the future,
> it was ultimately deemed too large a change at this time. It likely would
> include breaking changes to configuration files and and APIs, and therefore
> would be more reasonable to implement as a NiFi 3.0 / major version change,
> should the need ever arise. Additionally, framework-level ingress ports are
> defined via a different process (usually in nifi.properties) and once set do
> not change often; therefore, they are already much more management by
> something like IaC logic and less of a problem compared to ports that are
> part of flow definitions. This means that including them in the scope of this
> feature offers much less value despite greatly increasing the scope.
> The proposed scope and description offers the best benefit for minimal effort
> with practically no breaking changes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)