vrajat opened a new pull request, #12157: URL: https://github.com/apache/pinot/pull/12157
Pinot may take multiple hours between polling a partition in a Kafka topic. One specific example is that Pinot took a long time to flush a segment to disk. In the meantime, messages in Kafka can expire if message retention time is small. If `auto.offset.reset` is set to `smallest`, then Kafka will silently move the offset to the first available message leading to data loss. Before consuming messages from Kafka, check if any messages have expired by comparing the `startOffset` in the `RealtimeSegmentDataManager` to the `beginOffset` of the Kafka partition. If `startOffset` < `beginOffset`, then log the condition as well as set guage to 1. The guage can be connected to an alerting system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
