sivabalan narayanan created HUDI-8555:
-----------------------------------------

             Summary: Add support for nested field col stats generation for log 
files 
                 Key: HUDI-8555
                 URL: https://issues.apache.org/jira/browse/HUDI-8555
             Project: Apache Hudi
          Issue Type: Improvement
          Components: metadata
            Reporter: sivabalan narayanan


Out of the box, we generate col stats only for top level fields. but user does 
have an option to overide the columns for which they need hudi to generate cols 
stats for.

 

When we tested for a nested field, we realized that we have a gap here. Hudi 
does generate col stats for base files properly even for nested fields. but log 
files are missing to generate col stats. 

[https://github.com/apache/hudi/blob/fa5878d9c46f5c824ae56a9ad56ef90b0bc37a19/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java#L443]
 

The linked code snippet will only honor top level fields. 

 

So, we have two fixes here. 

Fix1: lets avoid generating stats even for base files. also throw exception if 
someone explicitly sets a nested field with 
"hoodie.metadata.index.column.stats.column.list". 
Fix2: Follow up to support nested field col stats generation. 

 

Fix1 is a blocker for 1.0 release. May be we can punt fix 2 for later. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to