I'm not even sure that the user generated field ids are kept. I think
Iceberg generates the ids using the util methods in TypeUtil, like
assignFreshIds, reassignIds etc.

One similar discussion could be found here:
https://github.com/apache/iceberg/issues/13164

Maximilian Michels <[email protected]> ezt írta (időpont: 2026. jan. 6., K,
16:19):

> Hi Shawn,
>
> To my knowledge, Iceberg does not specify the order of field id
> assignments. Moreover, it does not even enforce monotonically increasing
> field ids. The only requirement is that each field is uniquely identified
> by an id, which remains stable over time.
>
> That means that users shouldn't make any assumptions about how field ids
> are assigned. They should either assign field ids themselves by supplying a
> Schema with their own ids, or lookup the generated field ids using the
> field name.
>
> Cheers,
> Max
>
> On Mon, Dec 22, 2025 at 2:37 AM Shawn Chang <[email protected]>
> wrote:
>
>> Hi folks,
>>
>> I’ve noticed some interesting differences across Iceberg clients when
>> assigning new field IDs during schema conversion
>>
>> Specifically:
>>
>>    1.
>>
>>    *Iceberg Java* assigns field IDs using *ordinal order for the root
>>    struct*, followed by a *post-order traversal* for nested structs. For
>>    example:
>>
>>    struct<
>>      0: id: required long,
>>      1: info: optional struct<
>>        4: name: optional string,
>>        5: attrs: optional struct<
>>          2: age: optional int,
>>          3: score: optional double
>>        >
>>      >
>>    >
>>
>>    Here, nested fields follow a post-order traversal (age → score →
>>    attrs → name).
>>    2.
>>
>>    *Iceberg Python* appears to use a *pre-order traversal* when
>>    assigning fresh field IDs:
>>
>>    
>> https://github.com/apache/iceberg-python/blob/950fc7131b8e597f73647c6ff2bd78d0b24102ad/pyiceberg/schema.py#L1295
>>    3.
>>
>>    *Iceberg Rust* does not currently have a helper for schema
>>    conversion+field id assignment, but some existing logic appears to follow 
>> a *level-order
>>    traversal*:
>>
>>    
>> https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/spec/schema/id_reassigner.rs#L27
>>
>> This leads to two questions:
>>
>>    1.
>>
>>    *Does the assignment order of fresh field IDs actually matter?*
>>    My intuition is that it should not, as long as the field-ID → field
>>    mapping is consistent and the highest field ID is tracked correctly, but I
>>    would love to be corrected
>>    2.
>>
>>    *If the order does matter, is there a recommended or canonical
>>    traversal order that clients should follow?*
>>
>> Any guidance or historical context would be appreciated. Thanks!
>>
>> Best,
>> Shawn
>>
>

Reply via email to