npawar commented on a change in pull request #8309:
URL: https://github.com/apache/pinot/pull/8309#discussion_r821287802
##########
File path:
pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConnectionHandler.java
##########
@@ -56,6 +57,7 @@ public KafkaPartitionLevelConnectionHandler(String clientId,
StreamConfig stream
consumerProp.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
_config.getBootstrapHosts());
consumerProp.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
consumerProp.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
BytesDeserializer.class.getName());
+ consumerProp.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,
OffsetResetStrategy.EARLIEST.toString().toLowerCase());
Review comment:
Actually, we do know what caused OOR. Server was down for a while, and
when it came back up, the offsets had expired. This would be the most common
case (along with any pause we add in the future). Such a user would always want
the consumption to resume with minimal data loss.
I would treat this as a bug, instead of a configurable behavior. By default
we reset to latest, and cause a lot more data loss even though rows are
present. With this, we forward to the next point from where we can consume. The
behavior is similar to ValidationManager
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]