Yep! We discussed this yesterday.

The general plan going forward will be

Phase 1:
Merge Sort based compaction
Allow compaction/rewrite of data files using a space filling curve based sort. 
No planning or persisting of metrics.

Phase 2:
Support for Transforms with multiple arguments and possible parameterization
Store and metrics for curve values in datafile metrics along with transform 
used when writing file
Query planning using these metrics.


In my mind the final picture looks like

DataFileMetrics { zMax = ?, zMin = ?, sortOrder = 1)

Table Metadata {
  SortOrder 1 = "HilbertCurve(x, y, z) + Options { }"
  SortOrder 2 = "ZOrder(x,y) + Options(y using 128 bytes)" 
}

Or something like that. This way for any given data file we can generate 
filters based on the ordering function used for a particular data file and we 
can update our definitions of functions over time etc ...

I think the main spec change here is figuring out how to store these transforms 
with more information (and multiple args)

> On Jul 22, 2021, at 8:37 AM, Piotr Findeisen <pi...@starburstdata.com> wrote:
> 
> Hi Bhavyam,
> 
> Has this been discussed on the sync?
> Ryan, will it be making into the table metadata spec?
> 
> Best,
> PF
> 
> On Wed, Jul 21, 2021 at 1:50 PM Bhavyam Kamal <bhavyam.ka...@dremio.com 
> <mailto:bhavyam.ka...@dremio.com>> wrote:
> Hi Everyone,
> 
> I would like to discuss and get feedback on the following proposal for 
> Z-Ordering in the Iceberg Sync today:
> 
> https://docs.google.com/document/d/1UfGxaB7qlrGzzMk9pBm03oKPOkm-jk-NQVQQvHP-0Bc/edit?usp=sharing
>  
> <https://docs.google.com/document/d/1UfGxaB7qlrGzzMk9pBm03oKPOkm-jk-NQVQQvHP-0Bc/edit?usp=sharing>
> 
> Please let me know if you have any thoughts or suggestions by adding comments 
> in the doc.
> 
> Thanks and regards,
> Bhavyam
> 

Reply via email to