maheshguptags commented on issue #12738:
URL: https://github.com/apache/hudi/issues/12738#issuecomment-2658477298
I used an example to illustrate the issue.
The process successfully ingested and checkpointed data (5.5M out of 10M).
However, whenever the job was interrupted (either manually or due to
autoscaling), the remaining 4.5M records were discarded.
example
**Ingest 10M records:**
chkpnt1 → succeeded → ingested 2.5M (out of 10M)
chkpnt2 → succeeded → ingested 3M (remaining of 7.5M)
chkpnt3 → failed (either manually or due to autoscaling) → No data written
to Hudi table, and t**he remaining 4.5M records will be discarded after this
point**
Attempts the next checkpoint
chkpnt4 → succeeded → no data will be written due to the failure at
chkpnt3 and the checkpoint will complete within milliseconds.
Thanks
Mahesh
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]