Kevin Doran created NIP-14:
------------------------------
Summary: Support Dynamic Discovery of Data Ingress Ports
Key: NIP-14
URL: https://issues.apache.org/jira/browse/NIP-14
Project: NiFi Improvement Proposal
Issue Type: Improvement
Reporter: Kevin Doran
Assignee: Kevin Doran
*Motivation*
Increasingly, NiFi clusters are being deployed into environments where network
traffic is managed through a non-NiFi component, such as a gateway, ingress
controller, load balancer, or reverse proxy. These include cloud and
containerized deployment environments. NiFi infrastructure is sometimes managed
via infrastructure as code (IaC) frameworks as part of a larger system.
Additionally, many NiFi operators desire to deploy flow definitions
programmatically via automated deployment pipelines. Flow versioning and
promotion of flow definitions from non-production to production NiFi clusters
has become a best practice in the community.
The combination of these factors leads to situations where it would be
advantageous if data ingress ports defined as part of a flow definition (eg, a
ListenHTTP processor) could be dynamically discovered by infrastructure and
deployment software in order to automatically configure managed networking
components responsible for managing ingress traffic to NiFi clusters.
*Scope and Description*
There are currently 9 processors and 1 controller service included in the NiFi
source code distribution that define Listen Ports:
- HandleHTTPRequest
- ListenFTP
- ListenHTTP
- ListenOTLP
- ListenSyslog
- ListenTCP
- ListenTrapSNMP
- ListenUDP
- ListenUDPRecord
- JettyWebSocketServer (CS)
(Note, that ListenSlack uses the Listen* naming convention, but this processor
actually initiates a two-way TCP connection with a Slack workspace in order to
start receiving events. It does not create a server process to accept inbound
connections.)
Of course, it is possible that community members have (or could) develop custom
NiFi extensions that create Listen Ports. Apache NiFi components usually serve
as example, reference implementations in these cases. The goal of this feature
is to introduce standard interfaces for declaring data ingress ports in
components as well as framework mechanisms for a standard discovery process of
such ports, making it possible to support Apache NiFi provided components as
well as external, third-party extensions.
All of the example components listed above are ConfigurableComponents that
define a Port property the establishes the numbered port that is bound to in
order to listen to payloads and connections from external clients. A new
Property Descriptor field in nifi-api would be a natural place to annotate that
a property defines a Listen Port. This would allow discovery of Listen Ports
both at runtime by the framework, as well as in flow definitions, which is very
advantageous as it allows determining network ingress requirements based solely
on the static flow definition before it has even been deployed into a NiFi
runtime environment.
In addition to additions to Property Descriptors in NiFi API described above,
the NiFi Framework and NiFi REST API would be modified to dynamically discover
configurable components (ie, processors and controller services) containing
Listen Ports and list those components along with their current configuration
programmatically.
When looking at how a Listen Port should be defined, the most important aspect
is the layer 4 transport protocol, as that is usually the most relevant
information required to automatically configure external networking components
such as gateways, ingress controllers, load balancers, and proxies. Of
secondary importance is the layer 7 application protocol, if any. Looking at
existing projects that have solved similar problems, we can take Kubernetes as
an example. Kubernetes allows services, which can be arbitrary containers
running any process, to declare exports ports. For service ports, the
Kubernetes network data model just allows for declaring port number and
[transport
protocol|https://kubernetes.io/docs/reference/networking/service-protocols/]
using well-defined enum values. Optionally, [application
protocols|https://kubernetes.io/docs/concepts/services-networking/service/#application-protocol]
can be provided as a hint, use freeform strings, and when available can be
used by the provider for richer support of the app protocol. This data model
serves as a good guide for NiFi to model a similar situation, just replacing
services with NiFi extensions.
A proposed, draft update to the NiFi API to introduce to concept of Listen
Ports is available here:
[https://github.com/kevdoran/nifi-api/tree/listen-ports]
*Compatibility*
Largely, this is a backwards compatible change.
The proposed approach is entirely additive and optional: Once implemented and
released, extension components can opt-in to declaring Listen Ports that they
create. The burden to do so is minimal for component authors; in most cases, a
few lines of code. Once done, flow registries and operational tools built atop
them can dynamically discover flows that require data ingress rules, and the
NiFi framework can dynamically discover components that provide data ingress
ports and their port configuration to make it available via the REST API for
external components. These offer incentives for component authors to opt-in to
this feature, without any breaking changes that forces them to update their
components to continue working on newer NiFi versions.
There is one minor breaking change to the properties for the ListenSyslog
processor. Syslog is an application protocol that can work over TCP or UDP. The
current ListenSyslog processor allows specifying the port to listen on as a
second property for specifying the transport protocol to accept (TCP or UDP).
The proposed design for the NiFi API Property Descriptor would declaring a
static transport protocol associated with a Listen Port. This allows knowing
the transport protocol based on a flow definition without knowing what the
runtime configuration will be, which greatly simplifies rules NiFi operators
may want to codify such as if a flow definition is compatible with a target
NiFi Runtime (eg, an operator may by policy block all inbound UDP traffic for
security reasons.)
The proposed solution to this is replacing the ListenSyslog processor Port and
Protocol properties with TCP Port and UDP Port properties that are mutually
exclusive (only one is allowed to be configured at a time). The migrate
properties feature that was introduced for in-place flow version changes will
allow us to migrate flows from the old configuration to the new configuration
automatically for users, and migration guidance for the first NiFi release to
include the modified processor can cover the remaining cases.
*Verification*
The Verification process for this feature would include:
- Unit tests for components that define Listen Ports to make sure that they
implement the new interfaces, and then when configured, the correct Listen Port
definition is discoverable (ie, the return value for new interfaces matches the
expected values based on the component configuration)
- Integration tests that verify when multiple Listen components exist in a
NiFi Cluster, the Listen Ports they create are correctly discoverable via the
NiFi REST API.
- Instructions for peer-reviewers to manually verify the feature
implementation.
- Documentation updates, primarily to the NiFi Developer Guide, to add
instructions and guidance for implementing new listen components in a manner
that is compatible with this new feature.
*Alternatives*
The following alternatives were considered:
# {_}No changes to NiFi{_}; instead put the responsibility of discovering
Listen Ports solely on external components, such as deployment scripts and
infrastructure management logic. For example, external code could just "know"
(ie, using hardcoded logic) that ListenHTTP defines a data ingress port via a
property that accepts HTTP requests and look for instances of that known
Processor type. Alternatively, we could go with a convention-based approach
such as "processors that have a type name starting with Listen* and a Port
property." Both of these are very brittle, lack discoverability by extension
component authors, and do not account for the large community of NiFi
developers that may use different conventions than those used for Apache NiFi
components. For these reasons, this alternative was deemed insufficient.
# {_}A larger change that also tries to unify the new feature port discovery
to include the various framework-level ingress ports{_}, such as the remote
input port used for site-to-site protocol or cluster communication ports. For
example, maybe a new framework-level concept of a NiFi Gateway allows operators
and flow authors to put all network ingress rules in one place. While this may
have some advantages, and could be considered in the future, it was ultimately
deemed too large a change at this time. It likely would include breaking
changes to configuration files and and APIs, and therefore would be more
reasonable to implement as a NiFi 3.0 / major version change, should the need
ever arise. Additionally, framework-level ingress ports are defined via a
different process (usually in nifi.properties) and once set do not change
often; therefore, they are already much more management by something like IaC
logic and less of a problem compared to ports that are part of flow
definitions. This means that including them in the scope of this feature offers
much less value despite greatly increasing the scope.
The proposed scope and description offers the best benefit for minimal effort
with practically no breaking changes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)