tanmayrauth commented on issue #1080: URL: https://github.com/apache/iceberg-go/issues/1080#issuecomment-4455030135
PartitionSpec (TruncateTransform) : controls which rows go into which files.
All rows sharing the same truncated prefix land in the same partition's data
files. This is what enables file-level pruning during scans.
SortOrder (IdentityTransform) : controls the physical row ordering within
each data file. This is what makes Parquet column chunk min/max statistics
tight within a file, enabling row-group skipping.
Best practice for your case:
```
partitionSpec := iceberg.NewPartitionSpec(
iceberg.PartitionField{
SourceIDs: []int{2},
Transform: iceberg.TruncateTransform{Width: 20},
Name: "project_partition",
},
)
sortField := table.SortField{
SourceIDs: []int{2},
Transform: iceberg.IdentityTransform{},
Direction: table.SortASC,
NullOrder: table.NullsLast,
}
sortOrder, err := table.NewSortOrder(table.InitialSortOrderID,
[]table.SortField{sortField})
tbl, err := cat.CreateTable(ctx, tableIdent, icebergSchema,
catalog.WithPartitionSpec(&partitionSpec),
catalog.WithSortOrder(sortOrder),
)
```
You don't want TruncateTransform in the sort order — that would sort by
the truncated value, making all rows within a partition appear equivalently
ordered (since they already share the same truncated prefix). Instead use
IdentityTransform{} to sort by the full raw value, giving you the tightest
possible min/max stats within each file.
In short:
partition spec = which file a row lands in, sort order = row arrangement
within that file. Use truncate for the former, identity for the latter.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
