I can think of a few places where implementations have made some correlated decisions with field id assignment ... Specifically deciding which fields are included in metric collection by defaults, I think we once had field id assignment and that algorithm using the same traversal but I don't think they match anymore (in java) but I have no idea what other engines/impls are doing. So TLDR; I don't think it matters
On Tue, Jan 6, 2026 at 10:04 AM Péter Váry <[email protected]> wrote: > I'm not even sure that the user generated field ids are kept. I think > Iceberg generates the ids using the util methods in TypeUtil, like > assignFreshIds, reassignIds etc. > > One similar discussion could be found here: > https://github.com/apache/iceberg/issues/13164 > > Maximilian Michels <[email protected]> ezt írta (időpont: 2026. jan. 6., K, > 16:19): > >> Hi Shawn, >> >> To my knowledge, Iceberg does not specify the order of field id >> assignments. Moreover, it does not even enforce monotonically increasing >> field ids. The only requirement is that each field is uniquely identified >> by an id, which remains stable over time. >> >> That means that users shouldn't make any assumptions about how field ids >> are assigned. They should either assign field ids themselves by supplying a >> Schema with their own ids, or lookup the generated field ids using the >> field name. >> >> Cheers, >> Max >> >> On Mon, Dec 22, 2025 at 2:37 AM Shawn Chang <[email protected]> >> wrote: >> >>> Hi folks, >>> >>> I’ve noticed some interesting differences across Iceberg clients when >>> assigning new field IDs during schema conversion >>> >>> Specifically: >>> >>> 1. >>> >>> *Iceberg Java* assigns field IDs using *ordinal order for the root >>> struct*, followed by a *post-order traversal* for nested structs. >>> For example: >>> >>> struct< >>> 0: id: required long, >>> 1: info: optional struct< >>> 4: name: optional string, >>> 5: attrs: optional struct< >>> 2: age: optional int, >>> 3: score: optional double >>> > >>> > >>> > >>> >>> Here, nested fields follow a post-order traversal (age → score → >>> attrs → name). >>> 2. >>> >>> *Iceberg Python* appears to use a *pre-order traversal* when >>> assigning fresh field IDs: >>> >>> >>> https://github.com/apache/iceberg-python/blob/950fc7131b8e597f73647c6ff2bd78d0b24102ad/pyiceberg/schema.py#L1295 >>> 3. >>> >>> *Iceberg Rust* does not currently have a helper for schema >>> conversion+field id assignment, but some existing logic appears to >>> follow a *level-order >>> traversal*: >>> >>> >>> https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/spec/schema/id_reassigner.rs#L27 >>> >>> This leads to two questions: >>> >>> 1. >>> >>> *Does the assignment order of fresh field IDs actually matter?* >>> My intuition is that it should not, as long as the field-ID → field >>> mapping is consistent and the highest field ID is tracked correctly, but >>> I >>> would love to be corrected >>> 2. >>> >>> *If the order does matter, is there a recommended or canonical >>> traversal order that clients should follow?* >>> >>> Any guidance or historical context would be appreciated. Thanks! >>> >>> Best, >>> Shawn >>> >>
