[ 
https://issues.apache.org/jira/browse/HUDI-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5077:
--------------------------------------
    Description: 
As of now, we can only have a single deltastreamer write to a single hudi 
table. we have an ask from the community to have 2 deltastreamers write to a 
single table. 

 

Things required to be fixed:
 # we need to fix the checkpointing to have multiple key-value pairs, where key 
represents a unique identifier for the deltastreamer client and value 
represents the checkpoint. We might need to introduce a new notion of 
identifier for each deltastreamer in this case.
 # within delta sync, after writeClient.upsert, before calling 
writeClient.commit, we need to update the checkpoint value. for this, we might 
need to take a lock and then fetch latest checkpoint from timeline (since there 
could be multiple wirters) and then update the checkpoint. and release the 
lock. 

 

These are the changes I can think of. may be while implementing it, there could 
be some more minor fixes required. 

 

ask from a user: https://github.com/apache/hudi/issues/6718

 

  was:
As of now, we can only have a single deltastreamer write to a single hudi 
table. we have an ask from the community to have 2 deltastreamers write to a 
single table. 

 

Things required to be fixed:
 # we need to fix the checkpointing to have multiple key-value pairs, where key 
represents a unique identifier for the deltastreamer client and value 
represents the checkpoint. We might need to introduce a new notion of 
identifier for each deltastreamer in this case.
 # within delta sync, after writeClient.upsert, before calling 
writeClient.commit, we need to update the checkpoint value. for this, we might 
need to take a lock and then fetch latest checkpoint from timeline (since there 
could be multiple wirters) and then update the checkpoint. and release the 
lock. 

 

These are the changes I can think of. may be while implementing it, there could 
be some more minor fixes required. 

 


> Supporting multiple deltastreamers writing to a single hudi table
> -----------------------------------------------------------------
>
>                 Key: HUDI-5077
>                 URL: https://issues.apache.org/jira/browse/HUDI-5077
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: deltastreamer
>            Reporter: sivabalan narayanan
>            Priority: Major
>
> As of now, we can only have a single deltastreamer write to a single hudi 
> table. we have an ask from the community to have 2 deltastreamers write to a 
> single table. 
>  
> Things required to be fixed:
>  # we need to fix the checkpointing to have multiple key-value pairs, where 
> key represents a unique identifier for the deltastreamer client and value 
> represents the checkpoint. We might need to introduce a new notion of 
> identifier for each deltastreamer in this case.
>  # within delta sync, after writeClient.upsert, before calling 
> writeClient.commit, we need to update the checkpoint value. for this, we 
> might need to take a lock and then fetch latest checkpoint from timeline 
> (since there could be multiple wirters) and then update the checkpoint. and 
> release the lock. 
>  
> These are the changes I can think of. may be while implementing it, there 
> could be some more minor fixes required. 
>  
> ask from a user: https://github.com/apache/hudi/issues/6718
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to