I can think of a few places where implementations have made some correlated
decisions with field id assignment ... Specifically deciding which fields
are included in metric collection by defaults, I think we once had field id
assignment and that algorithm using the same traversal but I don't think
they match anymore (in java) but I have no idea what other engines/impls
are doing. So TLDR; I don't think it matters

On Tue, Jan 6, 2026 at 10:04 AM Péter Váry <[email protected]>
wrote:

> I'm not even sure that the user generated field ids are kept. I think
> Iceberg generates the ids using the util methods in TypeUtil, like
> assignFreshIds, reassignIds etc.
>
> One similar discussion could be found here:
> https://github.com/apache/iceberg/issues/13164
>
> Maximilian Michels <[email protected]> ezt írta (időpont: 2026. jan. 6., K,
> 16:19):
>
>> Hi Shawn,
>>
>> To my knowledge, Iceberg does not specify the order of field id
>> assignments. Moreover, it does not even enforce monotonically increasing
>> field ids. The only requirement is that each field is uniquely identified
>> by an id, which remains stable over time.
>>
>> That means that users shouldn't make any assumptions about how field ids
>> are assigned. They should either assign field ids themselves by supplying a
>> Schema with their own ids, or lookup the generated field ids using the
>> field name.
>>
>> Cheers,
>> Max
>>
>> On Mon, Dec 22, 2025 at 2:37 AM Shawn Chang <[email protected]>
>> wrote:
>>
>>> Hi folks,
>>>
>>> I’ve noticed some interesting differences across Iceberg clients when
>>> assigning new field IDs during schema conversion
>>>
>>> Specifically:
>>>
>>>    1.
>>>
>>>    *Iceberg Java* assigns field IDs using *ordinal order for the root
>>>    struct*, followed by a *post-order traversal* for nested structs.
>>>    For example:
>>>
>>>    struct<
>>>      0: id: required long,
>>>      1: info: optional struct<
>>>        4: name: optional string,
>>>        5: attrs: optional struct<
>>>          2: age: optional int,
>>>          3: score: optional double
>>>        >
>>>      >
>>>    >
>>>
>>>    Here, nested fields follow a post-order traversal (age → score →
>>>    attrs → name).
>>>    2.
>>>
>>>    *Iceberg Python* appears to use a *pre-order traversal* when
>>>    assigning fresh field IDs:
>>>
>>>    
>>> https://github.com/apache/iceberg-python/blob/950fc7131b8e597f73647c6ff2bd78d0b24102ad/pyiceberg/schema.py#L1295
>>>    3.
>>>
>>>    *Iceberg Rust* does not currently have a helper for schema
>>>    conversion+field id assignment, but some existing logic appears to 
>>> follow a *level-order
>>>    traversal*:
>>>
>>>    
>>> https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/spec/schema/id_reassigner.rs#L27
>>>
>>> This leads to two questions:
>>>
>>>    1.
>>>
>>>    *Does the assignment order of fresh field IDs actually matter?*
>>>    My intuition is that it should not, as long as the field-ID → field
>>>    mapping is consistent and the highest field ID is tracked correctly, but 
>>> I
>>>    would love to be corrected
>>>    2.
>>>
>>>    *If the order does matter, is there a recommended or canonical
>>>    traversal order that clients should follow?*
>>>
>>> Any guidance or historical context would be appreciated. Thanks!
>>>
>>> Best,
>>> Shawn
>>>
>>

Reply via email to