Yep! We discussed this yesterday. The general plan going forward will be
Phase 1: Merge Sort based compaction Allow compaction/rewrite of data files using a space filling curve based sort. No planning or persisting of metrics. Phase 2: Support for Transforms with multiple arguments and possible parameterization Store and metrics for curve values in datafile metrics along with transform used when writing file Query planning using these metrics. In my mind the final picture looks like DataFileMetrics { zMax = ?, zMin = ?, sortOrder = 1) Table Metadata { SortOrder 1 = "HilbertCurve(x, y, z) + Options { }" SortOrder 2 = "ZOrder(x,y) + Options(y using 128 bytes)" } Or something like that. This way for any given data file we can generate filters based on the ordering function used for a particular data file and we can update our definitions of functions over time etc ... I think the main spec change here is figuring out how to store these transforms with more information (and multiple args) > On Jul 22, 2021, at 8:37 AM, Piotr Findeisen <pi...@starburstdata.com> wrote: > > Hi Bhavyam, > > Has this been discussed on the sync? > Ryan, will it be making into the table metadata spec? > > Best, > PF > > On Wed, Jul 21, 2021 at 1:50 PM Bhavyam Kamal <bhavyam.ka...@dremio.com > <mailto:bhavyam.ka...@dremio.com>> wrote: > Hi Everyone, > > I would like to discuss and get feedback on the following proposal for > Z-Ordering in the Iceberg Sync today: > > https://docs.google.com/document/d/1UfGxaB7qlrGzzMk9pBm03oKPOkm-jk-NQVQQvHP-0Bc/edit?usp=sharing > > <https://docs.google.com/document/d/1UfGxaB7qlrGzzMk9pBm03oKPOkm-jk-NQVQQvHP-0Bc/edit?usp=sharing> > > Please let me know if you have any thoughts or suggestions by adding comments > in the doc. > > Thanks and regards, > Bhavyam >