[
https://issues.apache.org/jira/browse/IMPALA-10879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404526#comment-17404526
]
Attila Jeges commented on IMPALA-10879:
---------------------------------------
CR: https://gerrit.cloudera.org/#/c/17806/
> Add parquet stats to iceberg manifest
> -------------------------------------
>
> Key: IMPALA-10879
> URL: https://issues.apache.org/jira/browse/IMPALA-10879
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Frontend
> Affects Versions: Impala 4.0.0
> Reporter: Attila Jeges
> Assignee: Attila Jeges
> Priority: Major
> Labels: impala-iceberg
>
> Parquet stats should be written to iceberg manifest as per-datafile metrics.
> This task is specifically about the following metrics:
> - column_sizes : Map from column id to the total size on disk of all regions
> that store the column. Does not include bytes necessary to read other
> columns, like footers. Leave null for row-oriented formats
> - null_value_counts : Map from column id to number of null values in the
> column.
> - lower_bounds : Map from column id to lower bound in the column serialized
> as binary. Each value must be less than or equal to all non-null, non-NaN
> values in the column for the file.
> - upper_bounds : Map from column id to upper bound in the column serialized
> as binary. Each value must be greater than or equal to all non-null, non-Nan
> values in the column for the file.
> Iceberg manifest doc:
> https://iceberg.apache.org/spec/#manifests
> lower_bounds and upper_bounds values should be Single-value serialized to
> binary:
> https://iceberg.apache.org/spec/#appendix-d-single-value-serialization
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]