adp2201 commented on PR #14297:
URL: https://github.com/apache/iceberg/pull/14297#issuecomment-4049987669

   Thanks for driving this forward — this is a valuable capability.
   
   One high-level concern I still have is around deterministic behavior over 
time.
   Could we make the shredding selection contract more explicit so we avoid 
layout instability across files/batches?
   
   What would help a lot before merge:
   1. Define the selection/stability contract clearly (same logical input 
profile should produce the same shredded layout, independent of record arrival 
order as much as possible),
   2. Provide a user-controlled override path (table/write option) to pin 
shredding for important fields,
   3. Add consistency-focused tests across multiple batches/files, including:
      - mixed-type fields,
      - null-heavy fields,
      - decimal precision/scale variations,
      - field frequency threshold boundaries.
   
   The implementation approach makes sense as an interim path under DSv2 
constraints; making the contract and invariants explicit would make this much 
safer to operate in production.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to