Re: [I] Data Loss During Load Testing with METADATA Enabled and Autoscale Flink [hudi]

via GitHub Thu, 13 Feb 2025 23:25:51 -0800


maheshguptags commented on issue #12738:
URL: https://github.com/apache/hudi/issues/12738#issuecomment-2658477298


   I used an example to illustrate the issue. 
   The process successfully ingested and checkpointed data (5.5M out of 10M). 
However, whenever the job was interrupted (either manually or due to 
autoscaling), the remaining 4.5M records were discarded.
   
   example 
   **Ingest 10M records:** 
   
     chkpnt1 → succeeded → ingested 2.5M (out of 10M)
     chkpnt2 → succeeded → ingested 3M (remaining of 7.5M)
     chkpnt3 → failed (either manually or due to autoscaling) → No data written 
to Hudi table, and t**he remaining 4.5M records will be discarded after this 
point**
   
   Attempts the next checkpoint
   
     chkpnt4 → succeeded → no data will be written due to the failure at 
chkpnt3 and the checkpoint will complete within milliseconds.
     
   Thanks 
   Mahesh
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Data Loss During Load Testing with METADATA Enabled and Autoscale Flink [hudi]

Reply via email to