[ 
https://issues.apache.org/jira/browse/HUDI-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3279:
----------------------------------
    Description: 
While working on [https://github.com/apache/hudi/pull/4556,] I have stumbled 
upon an issue of the LogBlock Scanner EOF-ing on the log-files in tests after 
performing Restore operation.

The root-cause of these turned out to be Metadata Table storing incorrect sizes 
of the files after Restore (sizes in MT are essentially 2x of what is in FS):

!Screen Shot 2022-01-19 at 12.17.21 PM.png!

!Screen Shot 2022-01-19 at 12.18.27 PM.png!

 

This seems to occur due to following: 
 # Metadata table treats new Records for the same file as "deltas", appending 
the file-size to its records 
([REF|[https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java#L227])]
 # Upon Restore (which is handled simply as a collection of Rollbacks) we pick 
*max* of the sizes of the files before and after the operation, not regarding 
to which we're actually rolling back to 
([REF|[https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java#L254]).]

 

*Proposal*

Instead of simply always picking the max size, we should pick the size of the 
file as it was right before 

 

  was:
While working on [https://github.com/apache/hudi/pull/4556,] I have stumbled 
upon an issue of the LogBlock Scanner EOF-ing on the log-files in tests after 
performing Restore operation.

The root-cause of these turned out to be Metadata Table storing incorrect sizes 
of the files after Restore (sizes in MT are essentially 2x of what is in FS):

!Screen Shot 2022-01-19 at 12.17.21 PM.png!

!Screen Shot 2022-01-19 at 12.18.27 PM.png!

 

 


> Metadata table stores incorrect file sizes after Restore
> --------------------------------------------------------
>
>                 Key: HUDI-3279
>                 URL: https://issues.apache.org/jira/browse/HUDI-3279
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>             Fix For: 0.11.0
>
>         Attachments: Screen Shot 2022-01-19 at 12.17.21 PM.png, Screen Shot 
> 2022-01-19 at 12.18.27 PM.png
>
>
> While working on [https://github.com/apache/hudi/pull/4556,] I have stumbled 
> upon an issue of the LogBlock Scanner EOF-ing on the log-files in tests after 
> performing Restore operation.
> The root-cause of these turned out to be Metadata Table storing incorrect 
> sizes of the files after Restore (sizes in MT are essentially 2x of what is 
> in FS):
> !Screen Shot 2022-01-19 at 12.17.21 PM.png!
> !Screen Shot 2022-01-19 at 12.18.27 PM.png!
>  
> This seems to occur due to following: 
>  # Metadata table treats new Records for the same file as "deltas", appending 
> the file-size to its records 
> ([REF|[https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java#L227])]
>  # Upon Restore (which is handled simply as a collection of Rollbacks) we 
> pick *max* of the sizes of the files before and after the operation, not 
> regarding to which we're actually rolling back to 
> ([REF|[https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java#L254]).]
>  
> *Proposal*
> Instead of simply always picking the max size, we should pick the size of the 
> file as it was right before 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to