The postHTTP processor has an option to send as a FlowFile to a listenHTTP processor on another NiFi. This allows you to keep the FlowFile attributes across multiple NiFis just like S2S. On Nov 25, 2015 1:58 PM, "Matthew Gaulin" <[email protected]> wrote:
> Ok, that all makes sense. The main reason, we like doing it strictly as > S2S is to maintain the flowfile attributes, so we would like to avoid > HTTP. Otherwise we would have to rebuild some of these attributes from the > content, which isn't the end of the world, but still no fun. We may > consider the idea of the single receive node for distribution to a cluster, > in order to further lock things down from a firewall standpoint. I think > the main thing we had to wrap our heads around was that every send node > needs to be able to directly connect to every receiver node. Thanks again > for the very detailed responses! > > On Wed, Nov 25, 2015 at 10:44 AM Matthew Clarke <[email protected] > > > wrote: > > > I am not following why you set all your Nodes (source and destination) to > > use the same hostname(s). Each hostname resolves to a single IP and by > > doing so doesn't all data get sent to a single end-point? > > > > The idea behind spreading out the connections when using S2S is for smart > > load balancing purposes. If all data going to another cluster passed > > through the NCM first, you lose that data load balancing capability > because > > one instance of NiFi (NCM in this case) has to receive all that network > > traffic. It sound like the approach you want is to send source data to a > > single NiFi point on another network and then have that single point > > redistribute that data internally to that network across multiple > > "processing" nodes in a cluster. > > > > This can be accomplished in several ways: > > > > 1. You could use S2S to send to a single instance of NiFi on the other > > network and then have that instance S2S that data to a cluster on that > same > > network. > > 2. You could use the postHTTP (source NiFi) and ListenHTTP (desitination > > NiFi) processors to facilitate sending data to a single Node in the > > destination cluster, and then have that Node use S2S to redistribute the > > data across the entire cluster. > > > > A more ideal setup to limit connections needed between networks, might > be: > > > > - Source cluster (consists of numerous low end servers or VMs) and a > single > > instance running on a beefy server/VM that will hand all data coming in > and > > out of this network. Use S2S top communicate between internal cluster > and > > single instance on same network. > > - The destination would be setup the same way cluster would look the > same. > > You can then use S2S or postHTTP to ListenHTTP to send data as NiFi > > FlowFIles between your network. That network to network data transfer > > shoudl occur between the two beefy single instances in each network. > > > > Matt > > > > > > > > > > On Wed, Nov 25, 2015 at 9:10 AM, Matthew Gaulin <[email protected]> > > wrote: > > > > > Thank you for the info. I was working with Edgardo on this. We ended > up > > > having to set the SAME hostname on each of the source nodes, as the > > > destination NCM uses for each of its nodes and of course open up the > > > firewall rules so all source nodes can talk to each destination node. > > This > > > seems to jive with that you explained above. It is a little annoying > > that > > > we have to have so much open to get this to work and can't have a > single > > > point of entry on the NCM to send all this data from one network to > > > another. Not a huge deal in the end though. Thanks again. > > > > > > On Wed, Nov 25, 2015 at 8:36 AM Matthew Clarke < > > [email protected]> > > > wrote: > > > > > > > let me explain first how S2S works when connecting from one cluster > to > > > > another cluster. > > > > > > > > I will start with the source cluster (this would be the cluster where > > you > > > > are adding the Remote Process Group (RPG) to the graph). The NCM has > > no > > > > role in this cluster. Every Node in a cluster works independently > form > > > one > > > > another, so by adding the RPG to the graph, you have added it to > every > > > > Node. So Now the behavior of each Node is the same as as it would be > > if > > > it > > > > were a standalone instance with regards to S2S. The URL you are > > > providing > > > > in that RPG would be the URL for the NCM of the target cluster (This > > URL > > > is > > > > not to the S2S port of the NCM, but to the same URL you would use to > > > access > > > > the UI of that cluster). Now each Node in your "source" cluster is > > > > communicating with the NCM of the destination cluster unaware at this > > > time > > > > that they are communicating with a NCM. These Nodes want to send > their > > > data > > > > to the S2S port on that NCM. Now of course since the NCM does not > > process > > > > any data, it is not going to accept any data from those Nodes. The > > > > "destination" NCM will respond to each of the "source" Nodes with the > > > > configured nifi.remote.input.socket.host=, > > > nifi.remote.input.socket.port=, > > > > and the status for each of those "destination" Nodes. Using that > > > provided > > > > information, the source Nodes can logically distribute the data to > our > > > the > > > > "destination' Nodes. > > > > > > > > When S2S fails beyond the initial URL connection, there are typically > > on > > > a > > > > few likely causes: > > > > 1. There is a firewall preventing communication between the source > > Nodes > > > > and the destination Nodes on the S2S ports. > > > > 2. No value was supplied for nifi.remote.input.socket.host= on each > of > > > the > > > > target Nodes. When no value is provided whatever the "hostname" > > command > > > > returns is what is sent. In many cases this hostname may end up > being > > > > "localhost" or some other value that is not resolvable/reachable by > the > > > > "source" systems. > > > > > > > > You can change the logging for S2S to DEBUG to see more detail about > > the > > > > message traffic between the "destination" NCM and the "source" nodes > by > > > > adding the following lines to the logback.xml files. > > > > > > > > <logger name="org.apache.nifi.remote" level="DEBUG"/> > > > > > > > > Watch the logs on one of the source Nodes specifically to see what > > > hostname > > > > and port is being returned for each destination Node. > > > > > > > > Thanks, > > > > Matt > > > > > > > > On Wed, Nov 25, 2015 at 7:59 AM, Matthew Clarke < > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > On Tue, Nov 24, 2015 at 1:38 PM, Edgardo Vega < > > [email protected]> > > > > > wrote: > > > > > > > > > >> Yeah the S2S port is set on all node. > > > > >> > > > > >> What should the host be set to on each machine? I first set it to > > the > > > > NCM > > > > >> ip on each machine in the cluster. Then I set the host to be the > ip > > of > > > > >> each > > > > >> individual machine without luck. > > > > >> > > > > >> The S2S port is open to the internet for the entire cluster for > > those > > > > >> ports. > > > > >> > > > > >> On Tue, Nov 24, 2015 at 1:35 PM, Matthew Clarke < > > > > >> [email protected]> > > > > >> wrote: > > > > >> > > > > >> > Did you configure the S2S port on all the Nodes in the cluster > you > > > are > > > > >> > trying to S2S to? > > > > >> > > > > > >> > In addition to setting the port on those Nodes, you should also > > set > > > > the > > > > >> S2S > > > > >> > hostname. The hostname entered should be resolvable and > reachable > > > by > > > > >> the > > > > >> > systems trying to S2S to that cluster. > > > > >> > > > > > >> > Thanks, > > > > >> > Matt > > > > >> > > > > > >> > On Tue, Nov 24, 2015 at 1:29 PM, Edgardo Vega < > > > [email protected] > > > > > > > > > >> > wrote: > > > > >> > > > > > >> > > Trying to get site to site working from one cluster to > another. > > It > > > > >> works > > > > >> > is > > > > >> > > the connection goes from cluster to single node but not > clusted > > to > > > > >> > > clustered. > > > > >> > > > > > > >> > > I was looking at jira and saw this ticket > > > > >> > > https://issues.apache.org/jira/browse/NIFI-872. > > > > >> > > > > > > >> > > Is this saying I am out of luck or is there some special > config > > > > that I > > > > >> > must > > > > >> > > do to make this work? > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > -- > > > > >> > > Cheers, > > > > >> > > > > > > >> > > Edgardo > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> Cheers, > > > > >> > > > > >> Edgardo > > > > >> > > > > > > > > > > > > > > > > > > > >
