tanmayrauth opened a new pull request, #1157: URL: https://github.com/apache/iceberg-go/pull/1157
Resolve the table's default sort order to Arrow compute SortKeys and apply compute.SortRecordBatch to each batch before flushing to a data file. Skip for unsorted tables (sort_order_id 0) and for delete files, where the table-level order does not apply. All data write paths route through writerFactory + RollingDataWriter (unpartitioned, clustered, and hash-fanout), so Append, Overwrite, Delete data writes, WriteRecords, and compaction (which calls WriteRecords) all pick up the sort. Limitations: - Per-batch, not per-file: each Add() is sorted internally; rows are not merged across batches. Matches PyIceberg. - Bucket transform falls back to source-column order (the manifest still records sort_order_id verbatim). - Multi-arg transform sort fields use only the first source column. - Nested-field sort sources are rejected with a clear error. Closes #833 (https://github.com/apache/iceberg-go/issues/833). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
