[ 
https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5464:
--------------------------------------
    Description: 
we re-use the same instant time as the commit being applied to MDT while 
instantiating a new partition in MDT. this needs to be fixed. 

 

for eg, lets say we have 10 commits w/ already FILES enabled. 

for C11, we are enabling col-stats. 

after data table business, when we enter metadata writer instantiation, we 
deduct that col-stats has to be instantiated and then instantiate using DC11. 
in MDT timeline, we see dc11.req. dc11.inflight and dc11.complete. and then we 
go ahead and apply actual C11 from DT to MDT (dc11.inflight and dc11.complete 
is updated). here, we overwrite the same DC11 w/ records pertaining to C11. 

which is buggy. we definitely need to fix this. 

We can add a suffix to C11 (say C11_003 or C11_001) as we do for compaction and 
clean in MDT so that any additional operation in MDT has a diff commit time 
format. For everything else, it should match w/ DT 1 on 1. 

 

  was:
we re-use the same instant time as the commit being applied to MDT while 
instantiating a new partition in MDT. this needs to be fixed. 

 

for eg, lets say we have 10 commits w/ already FILES enabled. 

for C11, we are enabling col-stats. 

after data table business, when we enter metadata writer instantiation, we 
deduct that col-stats has to be instantiated and then instantiate using DC11. 
and then we go ahead and apply actual C11 from DT to MDT. here, we overwrite 
the same DC11 w/ records pertaining to C11. 

which is buggy. we definitely need to fix this. 

We can add a suffix to C11 (say C11_003 or C11_001) as we do for compaction and 
clean in MDT so that any additional operation in MDT has a diff commit time 
format. For everything else, it should match w/ DT 1 on 1. 

 


> Fix instantiation of a new partition in MDT re-using the same instant time as 
> a regular commit
> ----------------------------------------------------------------------------------------------
>
>                 Key: HUDI-5464
>                 URL: https://issues.apache.org/jira/browse/HUDI-5464
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> we re-use the same instant time as the commit being applied to MDT while 
> instantiating a new partition in MDT. this needs to be fixed. 
>  
> for eg, lets say we have 10 commits w/ already FILES enabled. 
> for C11, we are enabling col-stats. 
> after data table business, when we enter metadata writer instantiation, we 
> deduct that col-stats has to be instantiated and then instantiate using DC11. 
> in MDT timeline, we see dc11.req. dc11.inflight and dc11.complete. and then 
> we go ahead and apply actual C11 from DT to MDT (dc11.inflight and 
> dc11.complete is updated). here, we overwrite the same DC11 w/ records 
> pertaining to C11. 
> which is buggy. we definitely need to fix this. 
> We can add a suffix to C11 (say C11_003 or C11_001) as we do for compaction 
> and clean in MDT so that any additional operation in MDT has a diff commit 
> time format. For everything else, it should match w/ DT 1 on 1. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to