[ 
https://issues.apache.org/jira/browse/NIFI-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718601#comment-16718601
 ] 

Josef Zahner commented on NIFI-5882:
------------------------------------

[~ijokarumawak] maybe I was a bit unclear with the comment about the order. In 
our case it is not a must that p3 gets processed before p4., but from 
distribution point it looks better for us if the files are distributed like 
below (or like in queue_with_two_processors.png) and not fully random, as long 
as they are sorted within each queue.
 * qL: p1, p4
 * qR1: p2, p5
 * qR2: p3, p6

The background is the following, we are storing the data into a DB with two 
primary keys. One is the data creation time and one an incremental number. We 
are parsing now files from several servers (so we see the same timestamp a lot) 
but our incremental number increments only from 0 - 10 million and then starts 
again from 0. So as more data we get with the same timestamp, as more likely is 
it that we have two times the same timestamp and the same incremental number 
(due to an overflow). Additionally I have to say that the incremental number 
range is different on each cluster node, so If we achieve a fully ordered 
distribution we wouldn't see that fast an overflow with the same timestamp. At 
the end if we don't have more than 10 million records with the same timestamp 
it doesn't matter which method (one processor queue or two processors queue) we 
use, it would work either way.

Thanks a lot for your explanations.

 

> Connector Prioritizers doesn't work together with Load Balance Strategy
> -----------------------------------------------------------------------
>
>                 Key: NIFI-5882
>                 URL: https://issues.apache.org/jira/browse/NIFI-5882
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.8.0
>         Environment: Centos 7.5, Secured 8 Node NiFi Cluster
>            Reporter: Josef Zahner
>            Priority: Major
>         Attachments: connector_config.png, queue_with_one_processor.png, 
> queue_with_two_processors.png, template_overview.png
>
>
> For my template please check the picture "template_overview.png". On the left 
> hand side the working (two processor) example and on the right hand side the 
> not working one (one processor).
> I have a ListSFTP Processor which reads files from 4 different folders. In 
> the filename of the files is a number (epochtime) which I'm parsing and set 
> it as "priority" attribute. We have a cluster, so I what I want to achieve 
> for the FetchSFTP is, that the files are fetched in order and are equally 
> distributed over our 8-node cluster.
> However, it seems that if I'm combining to set the "priority" attribute on an 
> UpdateAttribute processor and on the directly attached connector use the 
> following features:
>  * Load Balance Strategy: Round Robin
>  * Select Prioritizers: PriorityAttributePrioritizer
> the prioritizers doesn't seem to have any impact. 
> If i'm setting the priority attribute on an extra processor and use there 
> only the prioritizer - all files are in order but still on the primary node. 
> On the next processor then I'm setting the loadbalancing strategy for the 
> cluster (and add another attribute, but doesn't matter) and the prioritizer 
> together. That way it works. A picture of the queue for both examples is 
> attached (queue_with_one_processor & queue_with_two_processors.png).
> *To sum up*, it seems if I'm setting the "priority" attribute on an 
> UpdateAttribute processor and directly try to use it on the attached 
> connector with a loadbalancing strategy and the prioritizer 
> (PriorityAttributePrioritizer) then the priority attribute doesn't work as 
> expected. If I'm setting the "priority" attribute on a separate processor and 
> then do on an additional processor the magic load balancing strategy stuff 
> together with the prioritizer then it works. 
> Cheers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to