sivabalan narayanan created HUDI-8555:
-----------------------------------------
Summary: Add support for nested field col stats generation for log
files
Key: HUDI-8555
URL: https://issues.apache.org/jira/browse/HUDI-8555
Project: Apache Hudi
Issue Type: Improvement
Components: metadata
Reporter: sivabalan narayanan
Out of the box, we generate col stats only for top level fields. but user does
have an option to overide the columns for which they need hudi to generate cols
stats for.
When we tested for a nested field, we realized that we have a gap here. Hudi
does generate col stats for base files properly even for nested fields. but log
files are missing to generate col stats.
[https://github.com/apache/hudi/blob/fa5878d9c46f5c824ae56a9ad56ef90b0bc37a19/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java#L443]
The linked code snippet will only honor top level fields.
So, we have two fixes here.
Fix1: lets avoid generating stats even for base files. also throw exception if
someone explicitly sets a nested field with
"hoodie.metadata.index.column.stats.column.list".
Fix2: Follow up to support nested field col stats generation.
Fix1 is a blocker for 1.0 release. May be we can punt fix 2 for later.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)