[
https://issues.apache.org/jira/browse/IMPALA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Becker resolved IMPALA-13370.
------------------------------------
Resolution: Fixed
> Read Puffin stats from metadata.json property if available
> ----------------------------------------------------------
>
> Key: IMPALA-13370
> URL: https://issues.apache.org/jira/browse/IMPALA-13370
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Daniel Becker
> Assignee: Daniel Becker
> Priority: Major
> Labels: impala-iceberg
>
> When Trino writes Puffin stats for a column, it includes the NDV as a
> property in the "statistics" section of the metadata.json file, in addition
> to the Theta sketch in the Puffin file. When we are only reading the stats
> and not writing/updating them, it would be enough to read this property if it
> is present.
> An example of the "statistics" section:
> {code:java}
> "statistics" : [ {
> "snapshot-id" : 1226095104912303892,
> "statistics-path" :
> "hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/metadata/20240829_112839_00004_p6sck-7f433a45-607b-4561-89a3-fc4c58ef60d9.stats",
> "file-size-in-bytes" : 306,
> "file-footer-size-in-bytes" : 257,
> "blob-metadata" : [ {
> "type" : "apache-datasketches-theta-v1",
> "snapshot-id" : 1226095104912303892,
> "sequence-number" : 4,
> "fields" : [ 1 ],
> "properties" : {
> "ndv" : "2"
> }
> } ]
> } ]{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)