[
https://issues.apache.org/jira/browse/HUDI-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893546#comment-17893546
]
Y Ethan Guo commented on HUDI-8445:
-----------------------------------
The log blocks and files can be written by different commits using different
writer schemas in case there is schema evolution. Strictly speaking, the
column stats of a log file should only contain the columns that exist in the
file. Using the table schema in this case may introduce additional column
stats entries of columns that do not exist in the log file, making it hard to
understand.
> Fetch schema from log file while computing col stats
> ----------------------------------------------------
>
> Key: HUDI-8445
> URL: https://issues.apache.org/jira/browse/HUDI-8445
> Project: Apache Hudi
> Issue Type: Improvement
> Components: metadata
> Reporter: sivabalan narayanan
> Priority: Major
> Fix For: 1.1.0
>
>
> We are using table schema resolver to fetch writer schema to read log file
> while computing col stats. Ref : [https://github.com/apache/hudi/pull/12105]
>
> Lets follow up to see if we can just fetch the schema from the log file
> directly rather than using table schema resolver.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)