[ 
https://issues.apache.org/jira/browse/HUDI-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439599#comment-17439599
 ] 

Vinoth Chandar edited comment on HUDI-2475 at 11/7/21, 2:12 AM:
----------------------------------------------------------------

I agree with your assessment above that for multi-writer deployments or 
async/separate deployments of cleaning/compaction/clustering. Writing this up 
per deployment type for clarity 
h2. Deltastreamer continuous mode & Spark Streaming with in-writer process 
async table services, with no other writers

No action needed, rolling upgrades work just fine.

(I looked at the code to ensure the writer's write client gets created first, 
which will perform upgrade (delete existing metadata table, then rebuild it), 
before any async scheduling/execution happens)
h2. Multiple writers (or) Single writers with table services running 
out-of-writer process asynchronously

*1) Stop all writers and table services*
 - if running without metadata being enabled, then this is not necessary. 
 - However, when metadata table is ultimately turned on, we would need to stop 
all writers and table services. 

*2) Bring up writers serially, first writer will upgrade to 0.10*
 * First, need to bring up a writer. Bootstrapping of metadata table, brings it 
up to speed with latest completed instant time on data timeline, which would be 
farther along than any table service. 

*3) Deploy 0.10 to table services and redeploy*
 * Main question to answer here would be that when a compaction t1 is 
re-attempted and the metadata table is already been bootstrapped to t4 (because 
data timeline is further along), would the compaction/clustering be able to 
complete and write to the metadata timeline? **

*4) Do not enable metadata on query side until 1-3 are successfully completed*

As usual, it's recommended that readers are upgraded to 0.10 bundles prior to 
1-3. Query engines currently don't throw hard errors if a higher version code 
reads a lower version table, since most older readers work in lots of cases. 
h2. Follow ups .

1. Can an older metadata table reader (<= 0.9) read the new sync metadata 
table? This is another question to be answered. 

2. Can a pending compaction complete (since it will create an older commit on 
the metadata timeline) after upgrading to 0.10

3. We need to understand how cheap is having metadata enabled without a 
metadata table in place on query side, i.e we would fail and fallback to 
`FileSystemBackedTableMetadata` but does it involve some cost of opening 
connections/fs calls each time. We need to tighten this up in 0.10 before we 
can decide whether to turn metadata on by default for reader


was (Author: vc):
I agree with your assessment above that for multi-writer deployments or 
async/separate deployments of cleaning/compaction/clustering. 

Writing this up per deployment type for clarity 
{quote}{color:#FF0000}Still WIP {color}
{quote}
h2. Deltastreamer continuous mode & Spark Streaming with in-writer process 
async table services, with no other writers

No action needed, rolling upgrades work just fine.

(I looked at the code to ensure the writer's write client gets created first, 
which will perform upgrade (delete existing metadata table, then rebuild it), 
before any async scheduling/execution happens)
h2. Multiple writers (or) Single writers with table services running 
out-of-writer process asynchronously

*1) Stop all writers and table services*
 - If metadata enabled,
 - if metadata enabled later. 

*2) Bring up writers serially, first writer will upgrade to 0.10*

*3) Deploy 0.10 to table services and redeploy*

*4) Do not enable metadata on query side until 1-3 are successfully completed*

As usual, it's recommended that readers are upgraded to 0.10 bundles prior to 
1-3
 - Need to check what errors are thrown if not across engines.

 

> Rolling Upgrade downgrade story for 0.10 & enabling metadata
> ------------------------------------------------------------
>
>                 Key: HUDI-2475
>                 URL: https://issues.apache.org/jira/browse/HUDI-2475
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: sivabalan narayanan
>            Assignee: Vinoth Chandar
>            Priority: Blocker
>             Fix For: 0.10.0
>
>
> Upgrade downgrade infra for enabling metadata.
>  
> If user is having a writer process and clustering/compaction running async.
>  
>  - New synchronous metadata design, has a constraint that once metadata table 
> is bootstrapped, all commits will happen synchronously. In other words, there 
> is no catch up business wrt datatable.  
> So, it may not be feasible to do rolling upgrade (i.e. upgrade writer first 
> while async compaction is running) and then upgrade async compaction. 
> Bootstrap has to be done by stopping all processes and then we can restart 
> all other processes one by one (by using the upgraded hudi library) w/ 
> metadata enabled.  
> This is the only viable option I can think of. 
> 1. Stop all processes. Upgrade to hudi to a version w/ synchronous metadata. 
> bring up one writer process w/ metadata config enabled. this will bootstrap 
> the metadata table. and from there on, any new commits by the writer will do 
> synchronous updates to metadata.  
> Note: users can choose to upgrade via hudi-cli if need be. but easier would 
> be to just start the writer. Expect some delay for first commit since 
> bootstrap will be happening. 
> 2. Once first commit in previous writer process completes successfully, we 
> can restart all other processes. Upgrade the async table service (to hudi 
> version w/ metadata enabled) and restart it. *Ensure metadata table is 
> enabled across all processes.*  Even if missed on one, could result in data 
> loss.
>  
> By this, once metadata table is bootstrapped, any new commits from all 
> processes will be synced to metadata. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to