[
https://issues.apache.org/jira/browse/HUDI-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-5077:
--------------------------------------
Description:
As of now, we can only have a single deltastreamer write to a single hudi
table. we have an ask from the community to have 2 deltastreamers write to a
single table.
Things required to be fixed:
# we need to fix the checkpointing to have multiple key-value pairs, where key
represents a unique identifier for the deltastreamer client and value
represents the checkpoint. We might need to introduce a new notion of
identifier for each deltastreamer in this case.
# within delta sync, after writeClient.upsert, before calling
writeClient.commit, we need to update the checkpoint value. for this, we might
need to take a lock and then fetch latest checkpoint from timeline (since there
could be multiple wirters) and then update the checkpoint. and release the
lock.
These are the changes I can think of. may be while implementing it, there could
be some more minor fixes required.
ask from a user: https://github.com/apache/hudi/issues/6718
was:
As of now, we can only have a single deltastreamer write to a single hudi
table. we have an ask from the community to have 2 deltastreamers write to a
single table.
Things required to be fixed:
# we need to fix the checkpointing to have multiple key-value pairs, where key
represents a unique identifier for the deltastreamer client and value
represents the checkpoint. We might need to introduce a new notion of
identifier for each deltastreamer in this case.
# within delta sync, after writeClient.upsert, before calling
writeClient.commit, we need to update the checkpoint value. for this, we might
need to take a lock and then fetch latest checkpoint from timeline (since there
could be multiple wirters) and then update the checkpoint. and release the
lock.
These are the changes I can think of. may be while implementing it, there could
be some more minor fixes required.
> Supporting multiple deltastreamers writing to a single hudi table
> -----------------------------------------------------------------
>
> Key: HUDI-5077
> URL: https://issues.apache.org/jira/browse/HUDI-5077
> Project: Apache Hudi
> Issue Type: Improvement
> Components: deltastreamer
> Reporter: sivabalan narayanan
> Priority: Major
>
> As of now, we can only have a single deltastreamer write to a single hudi
> table. we have an ask from the community to have 2 deltastreamers write to a
> single table.
>
> Things required to be fixed:
> # we need to fix the checkpointing to have multiple key-value pairs, where
> key represents a unique identifier for the deltastreamer client and value
> represents the checkpoint. We might need to introduce a new notion of
> identifier for each deltastreamer in this case.
> # within delta sync, after writeClient.upsert, before calling
> writeClient.commit, we need to update the checkpoint value. for this, we
> might need to take a lock and then fetch latest checkpoint from timeline
> (since there could be multiple wirters) and then update the checkpoint. and
> release the lock.
>
> These are the changes I can think of. may be while implementing it, there
> could be some more minor fixes required.
>
> ask from a user: https://github.com/apache/hudi/issues/6718
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)