Alaksiej Ščarbaty created NIFI-14820:
----------------------------------------

             Summary: Add option to support original message ordering in 
ConsumeKafka
                 Key: NIFI-14820
                 URL: https://issues.apache.org/jira/browse/NIFI-14820
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Extensions
            Reporter: Alaksiej Ščarbaty


When using schema registries it's possible that messages with different schemas 
land into the same topic. E.g. when doing a rolling update and updating the 
schema on producer, or simply writing different messages into the same topic.

In several cases preserving message ordering (for a single partition) in a 
pipeline is important.
For OLTP workloads messages can share casual dependencies, so they must be 
processed in a particular order. Or some downstream processors may rely on 
messages being in the exact order as they were received from Kafka.

Currently {{ConsumeKafka}} groups records in flow files by their 
topic-partitions *as well as schemas* 
([source|https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-kafka-bundle/nifi-kafka-processors/src/main/java/org/apache/nifi/kafka/processors/consumer/convert/AbstractRecordStreamKafkaMessageConverter.java#L128]).
 Which means the messages may be passed downstream out of order, due to the 
grouping.

The processor should support both _Roll FlowFile_ (new) and _Group Records By 
Schema_ (existing) strategies, as in [ConsumeKinesis 
(WIP)|https://github.com/apache/nifi/pull/10053/files#diff-5e86a490a55dd29cce20638adb0f41401095aee8d94a734bf862c02f4ecf7fa8].
 To preserve backward compatibility, _Group Records By Schema_ should be chosen 
by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to