Re: [DISCUSS] v4 - Improved column statistics

Eduard Tudenhöfner Wed, 20 Aug 2025 06:05:30 -0700

Hey everyone,

We met yesterday and talked about some details around the stats proposal.


Please find the notes here
<https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?usp=sharing>
and the recording here
<https://drive.google.com/file/d/1YIILCIhDbgu3OYlMn5KNChsYFP8rGPPX/view?usp=sharing>
.

I have updated the proposal <https://s.apache.org/iceberg-column-stats>
with the following points:

   - added a table schema example with a detailed stats schema
   - updated wording to make it clear that projection is always by ID and
   the field name of a stats field should not be relied on
   - added a table that defines current field stats types with their
   respective offsets from the field ID of the base stats struct
   - updated wording to make it clear that stats are calculated for
   assigned field IDs that are
      - defined in the table ID space (Amogh is working on a separate
      proposal to unify ID spaces)
      - defined in the reserved field ID
      <https://iceberg.apache.org/spec/#reserved-field-ids> space
   - added some examples showing table ID -> stats ID of stats struct and
   also the stats ID of individual stats fields
   - updated wording to explain how variant stats would look in the new
   stats structure
   - updated wording to make it clear that custom stats are not supported
   and that expressions are the preferred way

Please let me know in case I missed anything else to include.

Thanks everyone for participating,

Eduard

Re: [DISCUSS] v4 - Improved column statistics

Reply via email to