Aditya Goenka created HUDI-7912:
-----------------------------------

             Summary: Couldn't restart streams when using Spark Structured 
Streaming when Kafka offset goes out of range
                 Key: HUDI-7912
                 URL: https://issues.apache.org/jira/browse/HUDI-7912
             Project: Apache Hudi
          Issue Type: Bug
          Components: spark
            Reporter: Aditya Goenka
             Fix For: 0.16.0


When using spark structured streaming with kafka and writing data in Hudi,. 
when jobs sometimes cant keep up with the input rate and fails as the kafka 
offset goes out of range (i.e earliest kafka messages are cleaned up due to the 
retention policy) and when we try to restart the job by clearing the previous 
checkpoint and consume from latest offset we see that the batches are skipped 
by the 'HoodieStreamingSink'. 

There is no way to restart these streams again currently.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to