[
https://issues.apache.org/jira/browse/FLINK-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940011#comment-14940011
]
ASF GitHub Bot commented on FLINK-2740:
---------------------------------------
Github user bbende commented on the pull request:
https://github.com/apache/flink/pull/1198#issuecomment-144771981
I realized I didn't fully answer your question about how that works in a
cluster... the SiteToSiteClient knows about all the nodes in the NiFi cluster
and will pull from the output port on each node. Same concept on the sending
side, it will distribute the data to different nodes in the cluster, sending
more data to nodes that are considered less busy. The SiteToSiteClient is what
we use internally for two NiFi instances/clusters to communicate with each
other.
Regarding how they know they belong to the same system, technically they
don't, and other non-Flink clients could pull from that same port, but we have
to hope that if an organization is using Flink and NiFi they would be ensuring
that port was only being used by this specific Flink streaming process, and
there could be multiple output ports to support multiple Flink streaming
processes pulling.
There is also the option to secure a NiFi instance with SSL, and then
provide SSL credentials to the SiteToSite client in order lockdown NiFi so that
only authorized clients can pull/push data.
> Create data consumer for Apache NiFi
> ------------------------------------
>
> Key: FLINK-2740
> URL: https://issues.apache.org/jira/browse/FLINK-2740
> Project: Flink
> Issue Type: New Feature
> Components: Streaming Connectors
> Reporter: Kostas Tzoumas
> Assignee: Joseph Witt
>
> Create a connector to Apache NiFi to create Flink DataStreams from NiFi flows
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)