[
https://issues.apache.org/jira/browse/HUDI-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan closed HUDI-8555.
-------------------------------------
Resolution: Fixed
> Fix nested field col stats generation for log files
> ----------------------------------------------------
>
> Key: HUDI-8555
> URL: https://issues.apache.org/jira/browse/HUDI-8555
> Project: Apache Hudi
> Issue Type: Improvement
> Components: metadata
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Blocker
> Fix For: 1.0.0
>
>
> Out of the box, we generate col stats only for top level fields. but user
> does have an option to overide the columns for which they need hudi to
> generate cols stats for.
>
> When we tested for a nested field, we realized that we have a gap here. Hudi
> does generate col stats for base files properly even for nested fields. but
> log files are missing to generate col stats.
> [https://github.com/apache/hudi/blob/fa5878d9c46f5c824ae56a9ad56ef90b0bc37a19/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java#L443]
>
> The linked code snippet will only honor top level fields.
>
> So, we have two fixes here.
> Fix1: lets avoid generating stats even for base files. also throw exception
> if someone explicitly sets a nested field with
> "hoodie.metadata.index.column.stats.column.list".
> Fix2: Follow up to support nested field col stats generation.
>
> Fix1 is a blocker for 1.0 release. May be we can punt fix 2 for later.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)