rdblue commented on a change in pull request #2055:
URL: https://github.com/apache/iceberg/pull/2055#discussion_r555258356
##########
File path: site/docs/spec.md
##########
@@ -254,6 +254,24 @@ Notes:
2. The width, `W`, used to truncate decimal values is applied using the scale
of the decimal column to avoid additional (and potentially conflicting)
parameters.
+### Sorting
+
+Users could sort their data within partitions by columns to gain performance.
The information on how the data is sorted could be declared per data or delete
file, by a **sort order**.
+
+A sort order is defined by an sort order id and a list of sort fields. The
order of the sort fields within the list defines the order in which the sort is
applied to the data. Each sort field consists of:
+
+* A **source column id** from the table's schema
+* A **transform** that is used to produce values to be sorted on from the
source column. This is the same transform as described in [partition
transforms](#partition-transforms).
+* A **sort direction**, that can only be either `asc` or `desc`
+* A **null order** that describes the order of null values when sorted. Can
only be either `nulls-first` or `nulls-last`
+
+Order id `0` is reserved for the unsorted order.
+
+A data or delete file is associated with a sort order by the sort order's id
within [a manifest](#manifests). Therefore, the table must declare all the sort
orders for lookup. A table could also be configured with a default sort order
id, indicating how the new data should be sorted by default. This default could
be overridden per file basis if the file is sorted differently, such as if the
engine is incapable of ensure ordering of the data on write, the generated
files should be annotated with sort order id 0 (unsorted).
+
+Note that only data files and equality delete files could have sort order.
[Position deletes](#position-delete-files) should not have sort order, since
they have their own sorting requirements.
Review comment:
I would rather not say "should not have a sort order". Instead, this
should note that position deletes are required to be sorted by file and
position. The manifest should not be written with an order ID for position
delete files, and readers must ignore the field for those files. This should
probably be in the manifests section instead of here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]