Alexey Kudinkin created HUDI-4245:
-------------------------------------
Summary: Support nested fields in Column Stats Index
Key: HUDI-4245
URL: https://issues.apache.org/jira/browse/HUDI-4245
Project: Apache Hudi
Issue Type: Bug
Reporter: Alexey Kudinkin
Currently only root-level fields are supported in the Column Stats Index, while
there's no reason for us not to be able to support nested fields given that
columnar file formats store nested fields as _nested columns,_ ie as columns
with a name of the field and corresponding struct it attributes to.
For example following schema:
{code:java}
c1: StringType
c2: StructType(Seq(StructField("foo", StringType))){code}
Would be stored in Parquet as "c1: string", "c2.foo: string", entailing that
Parquet actually already collects statistics for all the nested fields and we
just need to make sure we're propagating them into Column Stats Index
--
This message was sent by Atlassian Jira
(v8.20.7#820007)