Pranoti Shanbhag created HUDI-1947:
--------------------------------------
Summary: Hudi Commit Callback and commit in a single transaction
Key: HUDI-1947
URL: https://issues.apache.org/jira/browse/HUDI-1947
Project: Apache Hudi
Issue Type: New Feature
Reporter: Pranoti Shanbhag
Hello,
I am using Hudi Commit callbacks to call an internal service. As per my
understanding, the service is called after the commit on the dataset and if
there is a failure in the callback service we would not rollback the commit.
The service which we call saves the commit time in a database which is accessed
by multiple pipelines to get the incremental delta. For example, when there are
4 commits in hudi dataset, we register 4 commit timestamps in the database. The
pipelines that need the incremental delta, run at different frequencies and use
this database to fetch new data after their respective runs.
For this to work well, we need the hudi commit and call back to be atomic in a
single transaction. Otherwise on callback failures, there may be data in the
hudi dataset which may not be registered in the DB.
Please can you let me know if this can be supported and if there is a way to
achieve this with the current implementation. We do have retries set up and are
not expecting failures but we want to keep the hudi commits in sync with what
we register in the DB.
Thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)