[jira] [Commented] (HUDI-8445) Fetch schema from log file while computing col stats

Y Ethan Guo (Jira) Mon, 28 Oct 2024 10:17:04 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893546#comment-17893546
 ]


Y Ethan Guo commented on HUDI-8445:
-----------------------------------

The log blocks and files can be written by different commits using different 
writer schemas in case there is schema evolution.  Strictly speaking, the 
column stats of a log file should only contain the columns that exist in the 
file.  Using the table schema in this case may introduce additional column 
stats entries of columns that do not exist in the log file, making it hard to 
understand.

> Fetch schema from log file while computing col stats
> ----------------------------------------------------
>
>                 Key: HUDI-8445
>                 URL: https://issues.apache.org/jira/browse/HUDI-8445
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Priority: Major
>             Fix For: 1.1.0
>
>
> We are using table schema resolver to fetch writer schema to read log file 
> while computing col stats. Ref : [https://github.com/apache/hudi/pull/12105] 
>  
> Lets follow up to see if we can just fetch the schema from the log file 
> directly rather than using table schema resolver. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-8445) Fetch schema from log file while computing col stats

Reply via email to