Hey everyone, It's been a while since the last update but I just wanted to raise awareness that we've been iterating on the Content Stats Spec changes in #14234 <https://github.com/apache/iceberg/pull/14234>. Please take a look and let me know your thoughts.
Thanks Eduard On Fri, Sep 19, 2025 at 9:31 AM Eduard Tudenhöfner <[email protected]> wrote: > Hey everyone, > > I have updated the proposal > <https://docs.google.com/document/d/1uvbrwwAJW2TgsnoaIcwAFpjbhHkBUL5wY_24nKgtt9I/edit?tab=t.0#heading=h.hs6r9d26w1y2> > with the following things: > > - removed *column_size*, since this hasn't been used anywhere in > earlier versions. Please shout if you think we should keep this going > forward. > - added *avg_value_size* and *max_value_size* for avg/max value sizes > of variable-length types (string/binary) > - the examples in the proposal were using *1_417_000_000* as the > starting stats ID for the reserved field ID space, but that should have > been *2_147_000_000* because we have 200 reserved IDs * 200 stats > types = 40k and using *2_147_000_000* leaves enough room in case we > decide to add other ID spaces > > If people are ok then I think we should be able to vote on the design > proposal so that we could get the first portions of the code > <https://github.com/apache/iceberg/pull/13933> in, which would allow > parallelizing downstream work on this > > > Thanks > Eduard > > On Wed, Aug 20, 2025 at 3:05 PM Eduard Tudenhöfner < > [email protected]> wrote: > >> Hey everyone, >> >> We met yesterday and talked about some details around the stats proposal. >> >> Please find the notes here >> <https://docs.google.com/document/d/1ZK5g8_bA1Y9SQ4UA5jAREX9iNX56xLWA5vAuKpQC4L8/edit?usp=sharing> >> and the recording here >> <https://drive.google.com/file/d/1YIILCIhDbgu3OYlMn5KNChsYFP8rGPPX/view?usp=sharing> >> . >> >> I have updated the proposal <https://s.apache.org/iceberg-column-stats> >> with the following points: >> >> - added a table schema example with a detailed stats schema >> - updated wording to make it clear that projection is always by ID >> and the field name of a stats field should not be relied on >> - added a table that defines current field stats types with their >> respective offsets from the field ID of the base stats struct >> - updated wording to make it clear that stats are calculated for >> assigned field IDs that are >> - defined in the table ID space (Amogh is working on a separate >> proposal to unify ID spaces) >> - defined in the reserved field ID >> <https://iceberg.apache.org/spec/#reserved-field-ids> space >> - added some examples showing table ID -> stats ID of stats struct >> and also the stats ID of individual stats fields >> - updated wording to explain how variant stats would look in the new >> stats structure >> - updated wording to make it clear that custom stats are not >> supported and that expressions are the preferred way >> >> Please let me know in case I missed anything else to include. >> >> Thanks everyone for participating, >> >> Eduard >> >> >>
