Github user bbende commented on the pull request:

    https://github.com/apache/flink/pull/1198#issuecomment-144771981
  
    I realized I didn't fully answer your question about how that works in a 
cluster... the SiteToSiteClient knows about all the nodes in the NiFi cluster 
and will pull from the output port on each node. Same concept on the sending 
side, it will distribute the data to different nodes in the cluster, sending 
more data to nodes that are considered less busy. The SiteToSiteClient is what 
we use internally for two NiFi instances/clusters to communicate with each 
other. 
    
    Regarding how they know they belong to the same system, technically they 
don't, and other non-Flink clients could pull from that same port, but we have 
to hope that if an organization is using Flink and NiFi they would be ensuring 
that port was only being used by this specific Flink streaming process, and 
there could be multiple output ports to support multiple Flink streaming 
processes pulling.
    
    There is also the option to secure a NiFi instance with SSL, and then 
provide SSL credentials to the SiteToSite client in order lockdown NiFi so that 
only authorized clients can pull/push data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to