As an additional note to what Bryan said, there is a JIRA [1] that could
help in this case.
I'm trying to find time to work on it... but no luck so far :)

[1] https://issues.apache.org/jira/browse/NIFI-4026



2018-05-31 15:43 GMT+02:00 Bryan Bende <[email protected]>:

> Hello,
>
> If I'm understanding the situation correctly, you want ordering within
> a key, but not necessarily total ordering across all your data?
>
> I'm making this assumption since you said you have 9 partitions on
> your Kafka topic and you are partitioning by key, so the data for each
> key is in order per partition.
>
> The list + fetch pattern with redistribution doesn't have a way to
> control how the data is distributed, it is just round-robin and you
> can control the batch size, but you can't partition the data to nodes
> based on a key.
>
> There is an EnforceOrder processor [1] which was made to help with
> this kind of scenario, I believe specifically for CDC scenarios where
> the event log has to be processed in order. I haven't used it myself
> so maybe others can help here, but I believe you would use your "key"
> as the "Group Identifier" and then somehow you need to get an integer
> value on each flow file that represents the order within the group. So
> for example your A-event flow file would need some kind of attribute
> like "order = 1" and then the B-event flow file would need an
> attribute like "order = 2". You might be able to assign this order
> using an UpdateAttribute processor right after the ListSFTP, but you
> have to do it per key somehow.
>
> Another option is to just run the whole flow on primary node without
> doing the site-to-site redistribution, but then you lose out on
> parallel processing, and even on a single node I believe there are
> cases where ordering is not guaranteed.
>
> Thanks,
>
> Bryan
>
> [1] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.
> EnforceOrder/index.html
>
>
> On Wed, May 30, 2018 at 4:35 PM, rey26 <[email protected]> wrote:
> > Hello Team,
> >
> > We have a apache Nifi cluster with 3 nodes and 3 nodes kafka cluster.We
> are
> > receiving some files which has transactions in orders.(A-type first and
> than
> > B-type)
> > These events are in order but may come is different files.For example
> > A-event for id 111 can be present in file 1 and B-event can come in
> > immedaite file2 [B will always come after
> > A-type for any ID].We want data need to be puslished in the same order
> as it
> > is received.
> >
> > We developed a flow using ListSFTP+FecthFTP+publishkafka combination in
> > order ,have also done partitioning on kafka topic[9 partitions] on the
> > basics of a key column
> > and same key is used in Publish Kafka Processor.
> >
> > Al the events are published to the same partition but are going out of
> order
> > but within the partition are out of order.
> > Example B-type events are coming before A-Type in kafka topic TEST.
> >
> > Now i have some queries regarding the above
> >
> > What i understood is that since the ListSFTP+FecthFTP improves load
> > balancing but does it ensures ordering?
> > File1 may go to Node1 and File2 may go to Node2 , and Node 2 can publish
> the
> > record to the same partition on kafka before Node1?
> > Is there any way to gaurantee load order of files in Apache Nifi in
> cluster
> > Mode keeping perfomance in mind.?
> >
> > Since each task in PublishKafka processor is one publisher , if we run
> the
> > publish kafka on only primary node and pass only one broker-id does it
> will
> > do the trick?
> >
> >
> >
> > --
> > Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>

Reply via email to