[ 
https://issues.apache.org/jira/browse/HUDI-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3828:
----------------------------------
    Description: 
Currently, block-merging is configurable to be either lazy or non-lazy. However 
non-lazy sequence is incorrect – it will be merging blocks before actually 
rolling back reverted ones. To make sure users do not accidentally hit this 
issue, we need to revisit MOR block merging sequence and make sure that 
following invariants are upheld
 # Blocks are merged in 2 passes:
 ## First we load all blocks, while dropping rolled back ones, then
 ## We merge them in another forward-pass
 # We should try to avoid having 2 merging sequences and instead consolidate on 
just one: right now we have "block + block", and "base + block", but we should 
be able to just get away with just the latter (this will simplify merging 
sequence quite substantially, for ex in respect to handling of deletions) 

  was:
We need to revisit MOR block merging sequence and make sure that following 
invariants are upheld
 # Block have to be merged backward-pass (ie we first fetch all the blocks, and 
merge them in the reverse order of their timeline)
 # We should try to avoid having 2 merging sequences and instead consolidate on 
just one: right now we have "block + block", and "base + block", but we should 
be able to just get away with just the latter (this will simplify merging 
sequence quite substantially, for ex in respect to handling of deletions) 


> We need to revisit MOR block merging sequence
> ---------------------------------------------
>
>                 Key: HUDI-3828
>                 URL: https://issues.apache.org/jira/browse/HUDI-3828
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Critical
>             Fix For: 0.14.0
>
>
> Currently, block-merging is configurable to be either lazy or non-lazy. 
> However non-lazy sequence is incorrect – it will be merging blocks before 
> actually rolling back reverted ones. To make sure users do not accidentally 
> hit this issue, we need to revisit MOR block merging sequence and make sure 
> that following invariants are upheld
>  # Blocks are merged in 2 passes:
>  ## First we load all blocks, while dropping rolled back ones, then
>  ## We merge them in another forward-pass
>  # We should try to avoid having 2 merging sequences and instead consolidate 
> on just one: right now we have "block + block", and "base + block", but we 
> should be able to just get away with just the latter (this will simplify 
> merging sequence quite substantially, for ex in respect to handling of 
> deletions) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to