[I] Couldn't restart streams when using Spark Structured Streaming when Kafka offset goes out of range [hudi]

via GitHub Sun, 30 Nov 2025 02:17:17 -0800


hudi-bot opened a new issue, #16528:
URL: https://github.com/apache/hudi/issues/16528


   When using spark structured streaming with kafka and writing data in Hudi,. 
when jobs sometimes cant keep up with the input rate and fails as the kafka 
offset goes out of range (i.e earliest kafka messages are cleaned up due to the 
retention policy) and when we try to restart the job by clearing the previous 
checkpoint and consume from latest offset we see that the batches are skipped 
by the 'HoodieStreamingSink'. 
   
   There is no way to restart these streams again currently.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-7912
   - Type: Bug
   - Fix version(s):
     - 0.16.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Couldn't restart streams when using Spark Structured Streaming when Kafka offset goes out of range [hudi]

Reply via email to