[
https://issues.apache.org/jira/browse/IMPALA-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Csaba Ringhofer updated IMPALA-15106:
-------------------------------------
Description:
Existing theta sketches functions don't work for all types, for example there
is no support for TIMESTAMP. To use Iceberg's puffin files to store incremental
NDV stats, we need to support all types currently handled by COMPUTE STATS. The
Iceberg spec defines clearly how to do this for all types:
https://iceberg.apache.org/puffin-spec/#apache-datasketches-theta-v1-blob-type
https://iceberg.apache.org/spec/#appendix-d-single-value-serialization
Note that we may be unable to use nanosecond timestamps, probably an error
could be returned in that case.
was:
Existing theta sketches functions don't work for all types, for example there
is no support for TIMESTAMP. To use Iceberg's puffin files to store incremental
NDV stats, we need to support all types currently handled by COMPUTE STATS. The
Iceberg spec defines clearly how to do this for all types.
Note that we may be unable to use nanosecond timestamps, probably an error
could be returned in that case.
> Support missing types with theta sketches
> -----------------------------------------
>
> Key: IMPALA-15106
> URL: https://issues.apache.org/jira/browse/IMPALA-15106
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Reporter: Csaba Ringhofer
> Priority: Major
> Labels: datasketches, impala-iceberg
>
> Existing theta sketches functions don't work for all types, for example there
> is no support for TIMESTAMP. To use Iceberg's puffin files to store
> incremental NDV stats, we need to support all types currently handled by
> COMPUTE STATS. The Iceberg spec defines clearly how to do this for all types:
> https://iceberg.apache.org/puffin-spec/#apache-datasketches-theta-v1-blob-type
> https://iceberg.apache.org/spec/#appendix-d-single-value-serialization
> Note that we may be unable to use nanosecond timestamps, probably an error
> could be returned in that case.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]