Kevin Doran created NIP-14:
------------------------------

             Summary: Support Dynamic Discovery of Data Ingress Ports
                 Key: NIP-14
                 URL: https://issues.apache.org/jira/browse/NIP-14
             Project: NiFi Improvement Proposal
          Issue Type: Improvement
            Reporter: Kevin Doran
            Assignee: Kevin Doran


*Motivation*

Increasingly, NiFi clusters are being deployed into environments where network 
traffic is managed through a non-NiFi component, such as a gateway, ingress 
controller, load balancer, or reverse proxy. These include cloud and 
containerized deployment environments. NiFi infrastructure is sometimes managed 
via infrastructure as code (IaC) frameworks as part of a larger system.

Additionally, many NiFi operators desire to deploy flow definitions 
programmatically via automated deployment pipelines. Flow versioning and 
promotion of flow definitions from non-production to production NiFi clusters 
has become a best practice in the community.

The combination of these factors leads to situations where it would be 
advantageous if data ingress ports defined as part of a flow definition (eg, a 
ListenHTTP processor) could be dynamically discovered by infrastructure and 
deployment software in order to automatically configure managed networking 
components responsible for managing ingress traffic to NiFi clusters.

*Scope and Description*

There are currently 9 processors and 1 controller service included in the NiFi 
source code distribution that define Listen Ports:
 - HandleHTTPRequest
 - ListenFTP
 - ListenHTTP
 - ListenOTLP
 - ListenSyslog
 - ListenTCP
 - ListenTrapSNMP
 - ListenUDP
 - ListenUDPRecord
 - JettyWebSocketServer (CS)

(Note, that ListenSlack uses the Listen* naming convention, but this processor 
actually initiates a two-way TCP connection with a Slack workspace in order to 
start receiving events. It does not create a server process to accept inbound 
connections.)

Of course, it is possible that community members have (or could) develop custom 
NiFi extensions that create Listen Ports. Apache NiFi components usually serve 
as example, reference implementations in these cases. The goal of this feature 
is to introduce standard interfaces for declaring data ingress ports in 
components as well as framework mechanisms for a standard discovery process of 
such ports, making it possible to support Apache NiFi provided components as 
well as external, third-party extensions.

All of the example components listed above are ConfigurableComponents that 
define a Port property the establishes the numbered port that is bound to in 
order to listen to payloads and connections from external clients. A new 
Property Descriptor field in nifi-api would be a natural place to annotate that 
a property defines a Listen Port. This would allow discovery of Listen Ports 
both at runtime by the framework, as well as in flow definitions, which is very 
advantageous as it allows determining network ingress requirements based solely 
on the static flow definition before it has even been deployed into a NiFi 
runtime environment.

In addition to additions to Property Descriptors in NiFi API described above, 
the NiFi Framework and NiFi REST API would be modified to dynamically discover 
configurable components (ie, processors and controller services) containing 
Listen Ports and list those components along with their current configuration 
programmatically.

When looking at how a Listen Port should be defined, the most important aspect 
is the layer 4 transport protocol, as that is usually the most relevant 
information required to automatically configure external networking components 
such as gateways, ingress controllers, load balancers, and proxies. Of 
secondary importance is the layer 7 application protocol, if any. Looking at 
existing projects that have solved similar problems, we can take Kubernetes as 
an example. Kubernetes allows services, which can be arbitrary containers 
running any process, to declare exports ports. For service ports, the 
Kubernetes network data model just allows for declaring port number and 
[transport 
protocol|https://kubernetes.io/docs/reference/networking/service-protocols/] 
using well-defined enum values. Optionally, [application 
protocols|https://kubernetes.io/docs/concepts/services-networking/service/#application-protocol]
 can be provided as a hint, use freeform strings, and when available can be 
used by the provider for richer support of the app protocol. This data model 
serves as a good guide for NiFi to model a similar situation, just replacing 
services with NiFi extensions.

A proposed, draft update to the NiFi API to introduce to concept of Listen 
Ports is available here:
[https://github.com/kevdoran/nifi-api/tree/listen-ports] 

*Compatibility*

Largely, this is a backwards compatible change.

The proposed approach is entirely additive and optional: Once implemented and 
released, extension components can opt-in to declaring Listen Ports that they 
create. The burden to do so is minimal for component authors; in most cases, a 
few lines of code. Once done, flow registries and operational tools built atop 
them can dynamically discover flows that require data ingress rules, and the 
NiFi framework can dynamically discover components that provide data ingress 
ports and their port configuration to make it available via the REST API for 
external components. These offer incentives for component authors to opt-in to 
this feature, without any breaking changes that forces them to update their 
components to continue working on newer NiFi versions.

There is one minor breaking change to the properties for the ListenSyslog 
processor. Syslog is an application protocol that can work over TCP or UDP. The 
current ListenSyslog processor allows specifying the port to listen on as a 
second property for specifying the transport protocol to accept (TCP or UDP). 
The proposed design for the NiFi API Property Descriptor would declaring a 
static transport protocol associated with a Listen Port. This allows knowing 
the transport protocol based on a flow definition without knowing what the 
runtime configuration will be, which greatly simplifies rules NiFi operators 
may want to codify such as if a flow definition is compatible with a target 
NiFi Runtime (eg, an operator may by policy block all inbound UDP traffic for 
security reasons.)

The proposed solution to this is replacing the ListenSyslog processor Port and 
Protocol properties with TCP Port and UDP Port properties that are mutually 
exclusive (only one is allowed to be configured at a time). The migrate 
properties feature that was introduced for in-place flow version changes will 
allow us to migrate flows from the old configuration to the new configuration 
automatically for users, and migration guidance for the first NiFi release to 
include the modified processor can cover the remaining cases.

*Verification*

The Verification process for this feature would include:
 - Unit tests for components that define Listen Ports to make sure that they 
implement the new interfaces, and then when configured, the correct Listen Port 
definition is discoverable (ie, the return value for new interfaces matches the 
expected values based on the component configuration)
 - Integration tests that verify when multiple Listen components exist in a 
NiFi Cluster, the Listen Ports they create are correctly discoverable via the 
NiFi REST API.
 - Instructions for peer-reviewers to manually verify the feature 
implementation.
 - Documentation updates, primarily to the NiFi Developer Guide, to add 
instructions and guidance for implementing new listen components in a manner 
that is compatible with this new feature.

*Alternatives*

The following alternatives were considered:
 # {_}No changes to NiFi{_}; instead put the responsibility of discovering 
Listen Ports solely on external components, such as deployment scripts and 
infrastructure management logic. For example, external code could just "know" 
(ie, using hardcoded logic) that ListenHTTP defines a data ingress port via a 
property that accepts HTTP requests and look for instances of that known 
Processor type. Alternatively, we could go with a convention-based approach 
such as "processors that have a type name starting with Listen* and a Port 
property." Both of these are very brittle, lack discoverability by extension 
component authors, and do not account for the large community of NiFi 
developers that may use different conventions than those used for Apache NiFi 
components. For these reasons, this alternative was deemed insufficient.
 # {_}A larger change that also tries to unify the new feature port discovery 
to include the various framework-level ingress ports{_}, such as the remote 
input port used for site-to-site protocol or cluster communication ports. For 
example, maybe a new framework-level concept of a NiFi Gateway allows operators 
and flow authors to put all network ingress rules in one place. While this may 
have some advantages, and could be considered in the future, it was ultimately 
deemed too large a change at this time. It likely would include breaking 
changes to configuration files and and APIs, and therefore would be more 
reasonable to implement as a NiFi 3.0 / major version change, should the need 
ever arise. Additionally, framework-level ingress ports are defined via a 
different process (usually in nifi.properties) and once set do not change 
often; therefore, they are already much more management by something like IaC 
logic and less of a problem compared to ports that are part of flow 
definitions. This means that including them in the scope of this feature offers 
much less value despite greatly increasing the scope.

The proposed scope and description offers the best benefit for minimal effort 
with practically no breaking changes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to