[ 
https://issues.apache.org/jira/browse/IMPALA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-13370.
------------------------------------
    Resolution: Fixed

> Read Puffin stats from metadata.json property if available
> ----------------------------------------------------------
>
>                 Key: IMPALA-13370
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13370
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Daniel Becker
>            Assignee: Daniel Becker
>            Priority: Major
>              Labels: impala-iceberg
>
> When Trino writes Puffin stats for a column, it includes the NDV as a 
> property in the "statistics" section of the metadata.json file, in addition 
> to the Theta sketch in the Puffin file. When we are only reading the stats 
> and not writing/updating them, it would be enough to read this property if it 
> is present.
> An example of the "statistics" section:
> {code:java}
> "statistics" : [ {
>     "snapshot-id" : 1226095104912303892,
>     "statistics-path" : 
> "hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/metadata/20240829_112839_00004_p6sck-7f433a45-607b-4561-89a3-fc4c58ef60d9.stats",
>     "file-size-in-bytes" : 306,
>     "file-footer-size-in-bytes" : 257,
>     "blob-metadata" : [ {
>       "type" : "apache-datasketches-theta-v1",
>       "snapshot-id" : 1226095104912303892,
>       "sequence-number" : 4,
>       "fields" : [ 1 ],
>       "properties" : {
>         "ndv" : "2"
>       }
>     } ]
>   } ]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to