tanmayrauth opened a new pull request, #1157:
URL: https://github.com/apache/iceberg-go/pull/1157

   Resolve the table's default sort order to Arrow compute SortKeys and apply 
compute.SortRecordBatch to each batch before flushing to a data file. Skip for 
unsorted tables (sort_order_id 0) and for delete files, where the table-level 
order does not apply.
   
   All data write paths route through writerFactory + RollingDataWriter 
(unpartitioned, clustered, and hash-fanout), so Append, Overwrite, Delete data 
writes, WriteRecords, and compaction (which calls WriteRecords) all pick up the 
sort.
   
   Limitations:
   - Per-batch, not per-file: each Add() is sorted internally; rows are not 
merged across batches. Matches PyIceberg.
   - Bucket transform falls back to source-column order (the manifest still 
records sort_order_id verbatim).
   - Multi-arg transform sort fields use only the first source column.
   - Nested-field sort sources are rejected with a clear error.
   
   Closes #833 (https://github.com/apache/iceberg-go/issues/833).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to