Github user bbende commented on the pull request:
https://github.com/apache/flink/pull/1198#issuecomment-144771981
I realized I didn't fully answer your question about how that works in a
cluster... the SiteToSiteClient knows about all the nodes in the NiFi cluster
and will pull from the output port on each node. Same concept on the sending
side, it will distribute the data to different nodes in the cluster, sending
more data to nodes that are considered less busy. The SiteToSiteClient is what
we use internally for two NiFi instances/clusters to communicate with each
other.
Regarding how they know they belong to the same system, technically they
don't, and other non-Flink clients could pull from that same port, but we have
to hope that if an organization is using Flink and NiFi they would be ensuring
that port was only being used by this specific Flink streaming process, and
there could be multiple output ports to support multiple Flink streaming
processes pulling.
There is also the option to secure a NiFi instance with SSL, and then
provide SSL credentials to the SiteToSite client in order lockdown NiFi so that
only authorized clients can pull/push data.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---