As an additional note to what Bryan said, there is a JIRA [1] that could help in this case. I'm trying to find time to work on it... but no luck so far :)
[1] https://issues.apache.org/jira/browse/NIFI-4026 2018-05-31 15:43 GMT+02:00 Bryan Bende <[email protected]>: > Hello, > > If I'm understanding the situation correctly, you want ordering within > a key, but not necessarily total ordering across all your data? > > I'm making this assumption since you said you have 9 partitions on > your Kafka topic and you are partitioning by key, so the data for each > key is in order per partition. > > The list + fetch pattern with redistribution doesn't have a way to > control how the data is distributed, it is just round-robin and you > can control the batch size, but you can't partition the data to nodes > based on a key. > > There is an EnforceOrder processor [1] which was made to help with > this kind of scenario, I believe specifically for CDC scenarios where > the event log has to be processed in order. I haven't used it myself > so maybe others can help here, but I believe you would use your "key" > as the "Group Identifier" and then somehow you need to get an integer > value on each flow file that represents the order within the group. So > for example your A-event flow file would need some kind of attribute > like "order = 1" and then the B-event flow file would need an > attribute like "order = 2". You might be able to assign this order > using an UpdateAttribute processor right after the ListSFTP, but you > have to do it per key somehow. > > Another option is to just run the whole flow on primary node without > doing the site-to-site redistribution, but then you lose out on > parallel processing, and even on a single node I believe there are > cases where ordering is not guaranteed. > > Thanks, > > Bryan > > [1] https://nifi.apache.org/docs/nifi-docs/components/org. > apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard. > EnforceOrder/index.html > > > On Wed, May 30, 2018 at 4:35 PM, rey26 <[email protected]> wrote: > > Hello Team, > > > > We have a apache Nifi cluster with 3 nodes and 3 nodes kafka cluster.We > are > > receiving some files which has transactions in orders.(A-type first and > than > > B-type) > > These events are in order but may come is different files.For example > > A-event for id 111 can be present in file 1 and B-event can come in > > immedaite file2 [B will always come after > > A-type for any ID].We want data need to be puslished in the same order > as it > > is received. > > > > We developed a flow using ListSFTP+FecthFTP+publishkafka combination in > > order ,have also done partitioning on kafka topic[9 partitions] on the > > basics of a key column > > and same key is used in Publish Kafka Processor. > > > > Al the events are published to the same partition but are going out of > order > > but within the partition are out of order. > > Example B-type events are coming before A-Type in kafka topic TEST. > > > > Now i have some queries regarding the above > > > > What i understood is that since the ListSFTP+FecthFTP improves load > > balancing but does it ensures ordering? > > File1 may go to Node1 and File2 may go to Node2 , and Node 2 can publish > the > > record to the same partition on kafka before Node1? > > Is there any way to gaurantee load order of files in Apache Nifi in > cluster > > Mode keeping perfomance in mind.? > > > > Since each task in PublishKafka processor is one publisher , if we run > the > > publish kafka on only primary node and pass only one broker-id does it > will > > do the trick? > > > > > > > > -- > > Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/ >
