[
https://issues.apache.org/jira/browse/HUDI-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manoj Govindassamy updated HUDI-2488:
-------------------------------------
Status: Open (was: In Progress)
> Support async metadata index creation while regular writers and table
> services are in progress
> ----------------------------------------------------------------------------------------------
>
> Key: HUDI-2488
> URL: https://issues.apache.org/jira/browse/HUDI-2488
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: sivabalan narayanan
> Assignee: Manoj Govindassamy
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: image-2021-11-17-11-04-09-713.png
>
>
> For now, we have only FILES partition in metadata table. and our suggestion
> is to stop all processes and then restart one by one by enabling metadata
> table. first process to start back will invoke bootstrapping of the metadata
> table.
>
> But this may not work out well as we add more and more partitions to metadata
> table.
> We need to support bootstrapping a single or more partitions in metadata
> table while regular writers and table services are in progress.
>
>
> Penning down my thoughts/idea.
> I tried to find a way to get this done w/o adding an additional lock, but
> could not crack that. So, here is one way to support async bootstrap.
>
> Introducing a file called "available_partitions" in some special file under
> metadata table. This file will contain the list of partitions that are
> available to apply updates from data table. i.e. when we do synchronous
> updates from data table to metadata table, when we have N no of partitions in
> metadata table, we need to know what partitions are fully bootstrapped and
> ready to take updates. this file will assist in maintaining that info. We can
> debate on how to maintain this info (tbl props, or separate file etc, but for
> now let's say this file is the source of truth). Idea here is that, any async
> bootstrap process will update this file with the new partition that got
> bootstrapped once the bootstrap is fully complete. So that all other writers
> will know what partitions to update.
> Add we need to introduce a metadata_lock as well.
>
> here is how writers and async bootstrap will pan out.
>
> Regular writer or any async table service(compaction, etc):
> when changes are required to be applied to metadata table: // fyi. as of
> today this already happens within data table lock.
> Take metadata_lock
> read contents of available_partitions.
> prep records and apply updates to metadata table.
> release lock.
>
> Async bootstrap process:
> Start bootstrapping of a given partition (eg files) in metadata table.
> do it in a loop. i.e. first iteration of bootstrap could take 10 mins
> for eg. and then again catch up new commits that happened in the last 10 mins
> which could take 1 min for instance. and then again go for another loop.
> Whenever total bootstrap time for a round is ~ 1min or less, in the next
> round, we can go in for final iteration.
> During the final iteration, take the metadata_lock. // this lock
> should not be held for more than few secs.
> apply any new commits that happened while last iteration
> of bootstrap was happening.
> update "available_partitions" file with this partition
> info that got fully bootstrapped.
> release lock.
>
> metadata_lock: will ensure when async bootstrap is in final stages of
> bootstrapping, we should not miss any commits that is nearing completion. So,
> we ought to take a lock to ensure we don't miss out on any commits. Either
> async bootstrap will apply the update, or the actual writer itself will
> update directly if bootstrap is fully complete.
>
> Rgdn "available_partitions":
> I was looking for a way to know what partitions are fully ready to take in
> direct updates from regular writers and hence chose this way. We can also
> think about creating a temp_partition(files_temp or something) while
> bootstrap in progress and then rename to original partition name once
> bootstrap is fully complete. If we can ensure reliably renaming of these
> partitions(i.e, once files partition is available, it is fully ready to take
> in direct updates), we can take this route as well.
> Here is how it might pan out w/ folder/partition renaming.
>
> Regular writer or any async table service(compaction, etc):
> when changes are required to be applied to metadata table: // fyi. as of
> today this already happens within data table lock.
> Take metadata_lock
> list partitions in metadata table. ignore temp partitions.
> prep records and apply updates to metadata table.
> release lock.
>
> Async bootstrap process:
> Start bootstrapping of a given partition (eg files) in metadata table.
> create a temp folder for partition thats getting bootstrapped. (for eg:
> files_temp)
> do it in a loop. i.e. first iteration of bootstrap could take 10 mins
> for eg. and then again catch up new commits that happened in the last 10 mins
> which could take 1 min for instance. and then again go for another loop.
> Whenever total bootstrap time for a round is ~ 1min or less, in the next
> round, we can go in for final iteration.
> During the final iteration, take the metadata_lock. // this lock
> should not be held for more than few secs.
> apply any new commits that happened while last iteration
> of bootstrap was happening.
> rename files_temp to files.
> release lock.
> Note: we just need to ensure that folder renaming is consistent. On crash,
> either new folder is fully intact or not available. contents of old folder
> does not matter.
>
> Failures:
> a. if bootstrap failed midway, until "files" hasn't been created, we can
> delete files_temp and start all over again.
> b. if bootstrap failed just after rename, again we should be good. Just that
> lock may not have been released. We need to ensure the metadata lock is
> released. So, to tackle this, if acquiring metadata_lock from regular writer
> fails, we will just proceed onto listing partitions and applying updates.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)