[
https://issues.apache.org/jira/browse/HUDI-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-2461:
--------------------------------------
Description:
Even with synchronous patch, we instantiate metadata table with single writer
mode only.
But we need to support async compaction and cleaning and hence we need to think
about supporting multi-writer down the line.
Details:
all writes to metadata table happens within data table lock, including
compaction and cleaning in metadata table since we do inline. But as we scale
metadata table infra w/ more indexes, we need to support async compaction and
cleaning and so we need multi-writer support.
One possibility:
- Special transaction management for metadata table.
data table commits: all writes to metadata table will be guarded by datatable
lock (regular writes, clustering, compaction, everything). regular writes will
do usual conflict resolution, where as compaction and clustering may not.
Now coming to metadata table commits, there won't be any conflict resolution in
general for whole of metadata table. But we will ensure any commit happens by
acquiring a lock. Our presumption is that, all the conflict resolution would
have happened within data table before proceeding to make a commit in metadata
table and so we don't need to do any conflict resolution specifically.
Scheduling of compaction and cleaning will happen along w/ regular upserts. and
we will have async compaction and cleaning support. so, when these async
operations are looking to commit in metadata table, they will acquire lock,
make the commit and release the lock. Only one writer will be in progress
during metadata commit.
was:
Even with synchronous patch, we instantiate metadata table with single writer
mode only.
But we need to support async compaction and cleaning and hence we need to think
about supporting multi-writer down the line.
Details:
all writes to metadata table happens within data table lock, including
compaction and cleaning in metadata table since we do inline. But as we scale
metadata table infra w/ more indexes, we need to support async compaction and
cleaning and so we need multi-writer support.
One possibility:
- Special transaction management for metadata table.
data table commits: all writes to metadata table will be guarded by datatable
lock (regular writes, clustering, compaction, everything). regular writes will
do usual conflict resolution, where as compaction and clustering may not.
Now coming to metadata table commits, there won't be any conflict resolution in
general for whole of metadata table. But we will ensure any commit happens by
acquiring a lock.
Scheduling of compaction and cleaning will happen along w/ regular upserts. and
we will have async compaction and cleaning support. so, when these async
operations are looking to commit in metadata table, they will acquire lock,
make the commit and release the lock. Only one writer will be in progress
during metadata commit.
> Support multi-writer for metadata table
> ---------------------------------------
>
> Key: HUDI-2461
> URL: https://issues.apache.org/jira/browse/HUDI-2461
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: Writer Core
> Reporter: sivabalan narayanan
> Priority: Major
>
> Even with synchronous patch, we instantiate metadata table with single writer
> mode only.
> But we need to support async compaction and cleaning and hence we need to
> think about supporting multi-writer down the line.
> Details:
> all writes to metadata table happens within data table lock, including
> compaction and cleaning in metadata table since we do inline. But as we scale
> metadata table infra w/ more indexes, we need to support async compaction and
> cleaning and so we need multi-writer support.
> One possibility:
> - Special transaction management for metadata table.
> data table commits: all writes to metadata table will be guarded by datatable
> lock (regular writes, clustering, compaction, everything). regular writes
> will do usual conflict resolution, where as compaction and clustering may
> not.
> Now coming to metadata table commits, there won't be any conflict resolution
> in general for whole of metadata table. But we will ensure any commit happens
> by acquiring a lock. Our presumption is that, all the conflict resolution
> would have happened within data table before proceeding to make a commit in
> metadata table and so we don't need to do any conflict resolution
> specifically.
> Scheduling of compaction and cleaning will happen along w/ regular upserts.
> and we will have async compaction and cleaning support. so, when these async
> operations are looking to commit in metadata table, they will acquire lock,
> make the commit and release the lock. Only one writer will be in progress
> during metadata commit.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)