Thanks Mark and tijo for the opinion

Yes. I am looking into somewhat grouping the nodes. But again in this case the 
processor should run only on one node.
Few examples of this type of processors which we have are
1. Execute a spark job -> invoke the spark job and track it
2. Invoke distCp in Hadoop cluster -> Invoke the job and track
3. Invoke any other yarn/MR job -> Invoke the job and track

As a summary I wanted to partition the cluster into 2 type of nodes.
1. One handling the single execution processors ( multiple nodes, but one 
processor run on only one node)
2. Normal data processing nodes (this is like current Nifi nodes).

So grouping the nodes is one idea. But again in that also the processor 
execution needs to be controlled to be in one node only.

As a background, our plan is to deploy some sort of pipeline service where the 
deployment needs these two type of nodes to be deployed and managed separately.



-----Original Message-----
From: Tijo Thomas [] 
Sent: Wednesday, September 21, 2016 10:20 PM
Subject: Re: enforce run only in promary node $ multiple primary node

Changing the concept of "Run on Primary Node" to " Run on Only one node" will 
not solve the problem .  Name Grouping constructs would be better option . 

Our usecase is also similar.  We have many tasks to run only in one node and 
wanted to distribute the load . If we can have a list of primary node  to 
distribute the load it will solve our problem . 

    On Wednesday, 21 September 2016 6:01 PM, "" 
<> wrote:


I'd like to hear more about your use case, as from the description given, I'm 
not sure that this all would need to run on a primary node. Generally, you want 
only "source processors" to run on primary node.

One thing that I've been thinking about, though, is changing the concept of 
"Run on Primary Node" to a "Run on Only One Node." The concern there is that we 
will have cases where a few processors have to run on the same node. So we 
would need a mechanism for supporting that. Perhaps some sort of named grouping 


Sent from my iPhone

> On Sep 21, 2016, at 5:07 AM, Nijel s f <> wrote:
> Hi all
>                Supporting to Tijo’s thought, have one scenario.
> we are trying to use Nifi for a data pipeline solution. The scenario is to 
> coordinate between various services and provide a solution for big data 
> analysis
>                In our scenario many of the activities are kind of "run on 
>primary" mode processors. These are being implemented on top of various 
>components like Yarn, Hbase, Spark, DB etc.
>                One issue we are seeing is all these processors to be run on 
>primary node  [like spark execution, yarn/mr job execution etc.. ] and it is 
>only one.
>                We are thinking of having multiple primary nodes and assign 
>the activities using some distribution algorithm.
>                The idea is to handle the coordination and failover mechanism 
>using zookeeper.
>                Any thoughts on this ?
> Regards
> Nijel
> From: Jeff []
> Sent: Monday, September 19, 2016 11:17 PM
> To: Tijo Thomas;
> Subject: Re: enforce run only in promary node $ multiple primary node
> Tijo,
> To give you some information on your second question, you can design your 
> flow to redistribute the flowfiles coming out of your processors to other 
> nodes in the cluster for processing.  There are several examples on how this 
> on various blogs/email lists/etc, and I just grabbed one for reference, 
> written by Apache NiFi's own Bryan Bende: 
> Please review that thread and let us know if you have further questions!
> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas 
> <<>> wrote:
> Hi ,
> 1. While writing a processor is it possible to enforce to run only in primary 
> node. I saw a Jira for this but appears to unresolved.
> [NIFI-543] Provide extensions a way to indicate that they can run only on 
> primary node, if clustered - ASF 
> JIRA<>
> [NIFI-543] Provide extensions a way to indicate that they can run only on p...
> 2. Currently my Primary node is heavily loaded  as i have many  processor 
> which will run only in Primary node.  Is it possible to define multiple 
> primary nodes . or is it possible to configure processors not to run in 
> primary node.
> Tijo


Reply via email to