Hey folks,

Iceberg users are advised not only to partition their data but also to sort 
within partitions by columns in predicates in order to get the best 
performance. Right now, this process is mostly manual and performed by users 
before writing.
I am wondering if we should extend Iceberg metadata so that query engines can 
do this automatically in the future. We already have `sortColumns` in DataFile 
but they are not used.
Do we need a notion of sort columns in TableMetadata?
Spark’s sort spec is tightly coupled with bucketing and cannot be used alone. 
However, it seems reasonable to have partitioned and sorted tables without 
bucketing. How do we see this in Iceberg?
If we decide to have sort spec in the metadata, do we want to make it part of 
PartitionSpec or have it separately?
Thanks,
Anton

Reply via email to