[jira] [Updated] (HUDI-2461) Support multi-writer for metadata table

sivabalan narayanan (Jira) Mon, 20 Sep 2021 08:36:04 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


sivabalan narayanan updated HUDI-2461:
--------------------------------------
    Description: 
Even with synchronous patch, we instantiate metadata table with single writer 
mode only. 

But we need to support async compaction and cleaning and hence we need to think 
about supporting multi-writer down the line. 

Details:

all writes to metadata table happens within data table lock, including 
compaction and cleaning in metadata table since we do inline. But as we scale 
metadata table infra w/ more indexes, we need to support async compaction and 
cleaning and so we need multi-writer support. 

One possibility:
 - Special transaction management for metadata table. 

data table commits: all writes to metadata table will be guarded by datatable 
lock (regular writes, clustering, compaction, everything). regular writes will 
do usual conflict resolution, where as compaction and clustering may not. 

Now coming to metadata table commits, there won't be any conflict resolution in 
general for whole of metadata table. But we will ensure any commit happens by 
acquiring a lock. Our presumption is that, all the conflict resolution would 
have happened within data table before proceeding to make a commit in metadata 
table and so we don't need to do any conflict resolution specifically. 

Scheduling of compaction and cleaning will happen along w/ regular upserts. and 
we will have async compaction and cleaning support. so, when these async 
operations are looking to commit in metadata table, they will acquire lock, 
make the commit and release the lock. Only one writer will be in progress 
during metadata commit. 

 

 

  was:
Even with synchronous patch, we instantiate metadata table with single writer 
mode only. 

But we need to support async compaction and cleaning and hence we need to think 
about supporting multi-writer down the line. 

 

Details:

all writes to metadata table happens within data table lock, including 
compaction and cleaning in metadata table since we do inline. But as we scale 
metadata table infra w/ more indexes, we need to support async compaction and 
cleaning and so we need multi-writer support. 

One possibility:

- Special transaction management for metadata table. 

data table commits: all writes to metadata table will be guarded by datatable 
lock (regular writes, clustering, compaction, everything). regular writes will 
do usual conflict resolution, where as compaction and clustering may not. 

Now coming to metadata table commits, there won't be any conflict resolution in 
general for whole of metadata table. But we will ensure any commit happens by 
acquiring a lock. 

Scheduling of compaction and cleaning will happen along w/ regular upserts. and 
we will have async compaction and cleaning support. so, when these async 
operations are looking to commit in metadata table, they will acquire lock, 
make the commit and release the lock. Only one writer will be in progress 
during metadata commit. 

 

 


> Support multi-writer for metadata table
> ---------------------------------------
>
>                 Key: HUDI-2461
>                 URL: https://issues.apache.org/jira/browse/HUDI-2461
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: Writer Core
>            Reporter: sivabalan narayanan
>            Priority: Major
>
> Even with synchronous patch, we instantiate metadata table with single writer 
> mode only. 
> But we need to support async compaction and cleaning and hence we need to 
> think about supporting multi-writer down the line. 
> Details:
> all writes to metadata table happens within data table lock, including 
> compaction and cleaning in metadata table since we do inline. But as we scale 
> metadata table infra w/ more indexes, we need to support async compaction and 
> cleaning and so we need multi-writer support. 
> One possibility:
>  - Special transaction management for metadata table. 
> data table commits: all writes to metadata table will be guarded by datatable 
> lock (regular writes, clustering, compaction, everything). regular writes 
> will do usual conflict resolution, where as compaction and clustering may 
> not. 
> Now coming to metadata table commits, there won't be any conflict resolution 
> in general for whole of metadata table. But we will ensure any commit happens 
> by acquiring a lock. Our presumption is that, all the conflict resolution 
> would have happened within data table before proceeding to make a commit in 
> metadata table and so we don't need to do any conflict resolution 
> specifically. 
> Scheduling of compaction and cleaning will happen along w/ regular upserts. 
> and we will have async compaction and cleaning support. so, when these async 
> operations are looking to commit in metadata table, they will acquire lock, 
> make the commit and release the lock. Only one writer will be in progress 
> during metadata commit. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2461) Support multi-writer for metadata table

Reply via email to