Andrzej Bialecki created SOLR-18077:
---------------------------------------
Summary: CrossDC Consumer - out-of-order Kafka partition processing
Key: SOLR-18077
URL: https://issues.apache.org/jira/browse/SOLR-18077
Project: Solr
Issue Type: Bug
Components: module - crossDC
Affects Versions: 9.10
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki
When mirrored requests are submitted to Kafka in {{KafkaMirroringSink}} the
default partitioner is used (\{{BuiltInPartitioner}}), which is submits
messages to partitions in batches, switching between partitions in a
round-robin fashion.
The same partitioner will be used (see below) by the MirrorMaker when adding
messages to the target Kafka topic. Then
{{KafkaCrossDcConsumer.pollAndProcessRequests()}} method retrieves new records
- BUT then it iterates over partitions in a basically random order because
{{ConsumerRecords.partitions}} is a HashMap.
This means that the batches of messages retrieved from multiple partitions are
no longer necessarily in the same order as they were submitted. If requests in
these batches from multiple partitions refer to the same collection then they
may be applied out of order, leading to data divergence.
One possible solution is to explicitly use a different partitioning scheme when
submitting messages from {{KafkaMirroringSink}} . This happens automatically
when {{ProducerRecord}} key is explicitly set, and we can use the
{{collection}} name as the key - this way all requests for the same collection
will end up in the same partition, thus preserving the ordering.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]