Aditya Goenka created HUDI-7912:
-----------------------------------
Summary: Couldn't restart streams when using Spark Structured
Streaming when Kafka offset goes out of range
Key: HUDI-7912
URL: https://issues.apache.org/jira/browse/HUDI-7912
Project: Apache Hudi
Issue Type: Bug
Components: spark
Reporter: Aditya Goenka
Fix For: 0.16.0
When using spark structured streaming with kafka and writing data in Hudi,.
when jobs sometimes cant keep up with the input rate and fails as the kafka
offset goes out of range (i.e earliest kafka messages are cleaned up due to the
retention policy) and when we try to restart the job by clearing the previous
checkpoint and consume from latest offset we see that the batches are skipped
by the 'HoodieStreamingSink'.
There is no way to restart these streams again currently.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)