sajjad-moradi commented on code in PR #8321:
URL: https://github.com/apache/pinot/pull/8321#discussion_r903853686
##########
pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConsumer.java:
##########
@@ -55,7 +58,12 @@ public MessageBatch<byte[]> fetchMessages(long startOffset,
long endOffset, int
LOGGER.debug("poll consumer: {}, startOffset: {}, endOffset:{} timeout:
{}ms", _topicPartition, startOffset,
endOffset, timeoutMillis);
}
- _consumer.seek(_topicPartition, startOffset);
+ Map<TopicPartition, Long> beginningOffsets =
_consumer.beginningOffsets(Lists.newArrayList(_topicPartition));
+ Long beginningOffset = beginningOffsets.values().iterator().next();
+ // explicitly check for OutOfRange, where startOffset < beginningOffset
+ // without this, _consumer.poll will auto offset reset to latest,
resulting in data loss
+ _consumer.seek(_topicPartition, Math.max(startOffset, beginningOffset));
Review Comment:
We're adding one more call to kafka in the execution path for all happy
cases to fix a rare edge case. If Kafka consumer doesn't throw exception for
out of order scenario, maybe we should check the fetched messages and in case
there's no message, then we can get the beginning offset; seek to it; and then
fetch again?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]