Attila Jeges created IMPALA-10879:
-------------------------------------
Summary: Add parquet stats to iceberg manifest
Key: IMPALA-10879
URL: https://issues.apache.org/jira/browse/IMPALA-10879
Project: IMPALA
Issue Type: Improvement
Components: Backend, Frontend
Affects Versions: Impala 4.0.0
Reporter: Attila Jeges
Assignee: Attila Jeges
Parquet stats should be written to iceberg manifest as per-datafile metrics.
This task is specifically about the following metrics:
- column_sizes : Map from column id to the total size on disk of all regions
that store the column. Does not include bytes necessary to read other columns,
like footers. Leave null for row-oriented formats
- null_value_counts : Map from column id to number of null values in the column.
- lower_bounds : Map from column id to lower bound in the column serialized as
binary. Each value must be less than or equal to all non-null, non-NaN values
in the column for the file.
- upper_bounds : Map from column id to upper bound in the column serialized as
binary. Each value must be greater than or equal to all non-null, non-Nan
values in the column for the file.
Iceberg manifest doc:
https://iceberg.apache.org/spec/#manifests
lower_bounds and upper_bounds values should be Single-value serialized to
binary:
https://iceberg.apache.org/spec/#appendix-d-single-value-serialization
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]