Alaksiej Ščarbaty created NIFI-14820:
----------------------------------------
Summary: Add option to support original message ordering in
ConsumeKafka
Key: NIFI-14820
URL: https://issues.apache.org/jira/browse/NIFI-14820
Project: Apache NiFi
Issue Type: Improvement
Components: Extensions
Reporter: Alaksiej Ščarbaty
When using schema registries it's possible that messages with different schemas
land into the same topic. E.g. when doing a rolling update and updating the
schema on producer, or simply writing different messages into the same topic.
In several cases preserving message ordering (for a single partition) in a
pipeline is important.
For OLTP workloads messages can share casual dependencies, so they must be
processed in a particular order. Or some downstream processors may rely on
messages being in the exact order as they were received from Kafka.
Currently {{ConsumeKafka}} groups records in flow files by their
topic-partitions *as well as schemas*
([source|https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-kafka-bundle/nifi-kafka-processors/src/main/java/org/apache/nifi/kafka/processors/consumer/convert/AbstractRecordStreamKafkaMessageConverter.java#L128]).
Which means the messages may be passed downstream out of order, due to the
grouping.
The processor should support both _Roll FlowFile_ (new) and _Group Records By
Schema_ (existing) strategies, as in [ConsumeKinesis
(WIP)|https://github.com/apache/nifi/pull/10053/files#diff-5e86a490a55dd29cce20638adb0f41401095aee8d94a734bf862c02f4ecf7fa8].
To preserve backward compatibility, _Group Records By Schema_ should be chosen
by default.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)