[jira] [Created] (HUDI-6561) Ensure there is no data duplication with spark streaming writes

sivabalan narayanan (Jira) Tue, 18 Jul 2023 22:48:04 -0700

sivabalan narayanan created HUDI-6561:
-----------------------------------------


             Summary: Ensure there is no data duplication with spark streaming 
writes 
                 Key: HUDI-6561
                 URL: https://issues.apache.org/jira/browse/HUDI-6561
             Project: Apache Hudi
          Issue Type: Improvement
          Components: spark
            Reporter: sivabalan narayanan


w/ spark-streaming writes, we can deduce first batch using batchId vs an 
existing batch which got resumed after a long long time. 

 

we should guarantee idempotency by deducing the batch Id 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-6561) Ensure there is no data duplication with spark streaming writes

Reply via email to