> This one seems really problematic. Too many important optimizations depend on the file sort order. Can we have the writer verify the sort order as the files are written
Even if we did, when the desired sort order changes we can't just rewrite all of the data in the table. I think that this will lead to cases where performance degrades without regular maintenance, but I don't see a way to handle that other than degrading performance. Forcing a rewrite to sort data when the desired order changes just isn't possible, right? On Thu, Jul 18, 2019 at 7:33 AM Owen O'Malley <owen.omal...@gmail.com> wrote: > > > On Thu, Jul 18, 2019 at 5:30 AM Anton Okolnychyi > <aokolnyc...@apple.com.invalid> wrote: > >> Let me summarize what we talked here and follow up with a PR. >> >> - Iceberg should allow users to define a sort oder in its metadata that >> applies to partitions. >> - We should never assume the sort order is actually applied to all files >> in the table. >> > > This one seems really problematic. Too many important optimizations depend > on the file sort order. Can we have the writer verify the sort order as the > files are written? > > - Sort orders might evolve and change over time. When this happens, >> existing files will not be rewritten. Query engines should follow the >> updated sort order during subsequent writes. As a result, files within a >> table or partition can be sorted differently at a given point in time. >> - We should be able to define a sort order even for unpartitioned tables, >> as opposed to current Spark tables that allow a sort order only for >> bucketed tables. >> - SortOrder is separate from PartitionSpec. >> - SortOrder will rely on transformations to define complex sort orders. >> - Files will be annotated with sort_order_id instead of sort_columns. We >> keep the question of file_ordinal open for now. >> - To begin with, we will support asc/desc natural sort orders (UTF8 >> ordering for Strings). >> >> Thanks, >> Anton >> >> On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote: >> >> I agree that Iceberg metadata should include a way to configure a desired >> sort order. But I want to note that I don’t think that we can ever assume >> that it has been applied. Table configuration will evolve as use changes. >> We don’t want to require rewrites when a configuration gets updated, so an >> assumption should be that data files might not be sorted. >> >> Files that are sorted should indicate how they are sorted, so that >> optimizations are applied if the file’s metadata indicates it can be safely >> applied. For example, if both deletes and data rows are sorted the same >> way, you can merge the two streams instead of using a hash set to check >> whether a record has been deleted. I think this should rely on the delete >> file’s sort order matching the data file it is applied to. >> >> Should Iceberg allow users to define a sort spec only if the table is >> bucketed? >> >> No. In Iceberg, bucketing is just another partition transform. >> >> However, I think we need to consider what a sort order will mean. Here >> are a few observations: >> >> - Each file can have a sort order for its rows (Spark’s >> sortWithinPartitions, which sorts each task’s data) >> - Sorting is also used to cluster values across files so it makes >> sense for a table sort order to be applied within partitions (ORDER BY >> ) >> - Multiple writes to the same partition are not expected to rewrite >> existing data, so a partition may only be partially sorted or may have >> multiple sorted file sets >> - Partitioning is independent from sorting. Even when partitioning is >> orthogonal to a sort order (i.e., bucketing), partitioning must still take >> precedence. >> >> My conclusion is that a configured sort order applies to partitions, not >> data across partitions. Again, bucketing is just another type of partition. >> >> How should Iceberg encode sort specs? >> >> I don’t think this should be in table properties. The sort order should >> reference columns by ID so it doesn’t need to be changed when columns are >> renamed. I think this should be implemented like PartitionSpec. >> >> If sorting is applied within partitions, then I would make PartitionSpec >> and SortOrder separate. I would still use transforms to produce more >> complex sort orders. I think that’s a great idea, but we don’t need to mix >> partitioning and sorting to reuse transforms. Like partition specs, I think >> a table should be able to define multiple sort orders and each should be >> identified by ID. Then each data file can encode which sort order it was >> written with, just like manifests and partition specs. >> >> I think we should add sort-orders like partition-specs, and a >> default-sort-order-id like default-spec-id. This would also require >> removing sort_columns from data files in the spec >> <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We >> can keep file_ordinal, but probably want to add some context to know the >> group of files where it is valid. We could also remove it. >> >> Which sort orders should Iceberg support? >> >> I agree with what’s already been said: we should use a natural order for >> each type >> <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>, >> ascending and descending. To start, Strings must use UTF-8’s natural >> ordering and we can expand from there. >> >> Here’s what a sort order might look like: >> >> "sort-orders": [ >> { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] }, >> { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { >> "source-id": 5, "ascending": false } ] }, >> { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, >> "source-ids": [4, 5] } ] }, >> ] >> >> >> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi < >> aokolnyc...@apple.com.invalid> wrote: >> >>> In order to begin prototyping, I would start with the following >>> questions. >>> >>> 1) Does Iceberg need a sort spec? >>> - I would say yes >>> 2) Should Iceberg allow users to define a sort spec only if the table is >>> bucketed? >>> - I would say no, as it seems valid to have partitioned and sorted >>> tables. >>> 3) How should Iceberg encode sort specs? >>> - Option #1 is to rely on table properties, which will allow us to use >>> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am >>> not sure it would be easy to encode non-trivial sort specs and track sort >>> spec evolution (if needed). >>> - Option #2 is to extend PartitionSpec to cover sorting as well. This >>> option will allow us to use transformations to encode non-trivial sorts and >>> won't require many changes to the codebase. >>> - Option #3 is to store SortSpec separately from PartitionSpec. This >>> will require more changes compared to Option #2 but can also give us extra >>> flexibility. >>> >>> Each option has its own trade-offs, but I tend to think #2 is reasonable. >>> >>> 4) Which sort orders should Iceberg support? >>> - I think we have to be flexible and support adding more sort orders >>> later. In addition to what Owen said, we can add sorting based on >>> multi-dimensional space-filling curves in the future. >>> >>> >>> What do you think? >>> >>> Thanks, >>> Anton >>> >>> On 1 Jul 2019, at 18:06, Owen O'Malley <owen.omal...@gmail.com> wrote: >>> >>> My thought is just like Iceberg has to define partitioning and >>> bucketing, it has to define a canonical sort order. In particular, we can’t >>> afford to have Spark, Presto, and Hive writing files in different orders. I >>> believe the right approach is to define a sort order as a series of columns >>> where each column is either ascending or descending and defining the >>> natural sort order for each type. >>> >>> The hard bit will be if we need to support non-natural sorts of strings. >>> For example, if we need to support case-insensitive sorts or the different >>> collations that databases support, I’d hope that we could start with the >>> default of utf-8 byte ordering and expand as needed. If you are curious >>> what the different collations look like - >>> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database >>> . >>> >>> .. Owen >>> >>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi < >>> aokolnyc...@apple.com.INVALID> wrote: >>> >>> Hey folks, >>> >>> Iceberg users are advised not only to partition their data but also to >>> sort within partitions by columns in predicates in order to get the best >>> performance. Right now, this process is mostly manual and performed by >>> users before writing. >>> I am wondering if we should extend Iceberg metadata so that query >>> engines can do this automatically in the future. We already have >>> `sortColumns` in DataFile but they are not used. >>> >>> - Do we need a notion of sort columns in TableMetadata? >>> - Spark’s sort spec is tightly coupled with bucketing and cannot be >>> used alone. However, it seems reasonable to have partitioned and sorted >>> tables without bucketing. How do we see this in Iceberg? >>> - If we decide to have sort spec in the metadata, do we want to make >>> it part of PartitionSpec or have it separately? >>> >>> Thanks, >>> Anton >>> >>> >>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> >> >> -- Ryan Blue Software Engineer Netflix