[
https://issues.apache.org/jira/browse/HUDI-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-4245:
-----------------------------
Component/s: metadata
> Support nested fields in Column Stats Index
> -------------------------------------------
>
> Key: HUDI-4245
> URL: https://issues.apache.org/jira/browse/HUDI-4245
> Project: Apache Hudi
> Issue Type: Improvement
> Components: metadata
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Critical
> Fix For: 0.13.1
>
>
> Currently only root-level fields are supported in the Column Stats Index,
> while there's no reason for us not to be able to support nested fields given
> that columnar file formats store nested fields as _nested columns,_ ie as
> columns with a name of the field and corresponding struct it attributes to.
>
> For example following schema:
> {code:java}
> c1: StringType
> c2: StructType(Seq(StructField("foo", StringType))){code}
> Would be stored in Parquet as "c1: string", "c2.foo: string", entailing that
> Parquet actually already collects statistics for all the nested fields and we
> just need to make sure we're propagating them into Column Stats Index
>
> Original GH issue:
> [https://github.com/apache/hudi/issues/5804#issuecomment-1152983029]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)