I think of this topic in 3 areas: 1. how to extend SQL for interpolation; 2. how to represent interpolation in relational algebra; 3. how to implement the relational operator.
Your question was mostly about 3. I think you could add a ‘fill’ or ‘interpolate’ relational operator that assumes a sorted input and fills in the gaps. You could implement it in Enumerable and use it right after an EnumerableSort. Re 1. I don’t like how Clickhouse have done it by extending ORDER BY. Interpolation is very closely related to sampling - note that when you are resizing JPG images you think of upsizing and downsizing as variations of the same operation - and therefore should be done using a relational operator very similar to TABLESAMPLE. In the relational model, data sets aren’t sorted until you print them on the screen. Clickhouse conflating time series with sorting is worrying, because the logical next step is to build other quasi-relational operators that operate on sorted data, and that is not a good direction for a relational database to be going in. (I distinguish ’sorted data’ from ‘data that has an ordering’ as used in windowed aggregate functions and some other analytic functions such as rank and percentile.) My favorite time-series extensions to SQL are those done by Vertica[1]. For most time series analysis, people don’t want to see the expanded time series, just compute functions over it. And therefore Vertica’s makes time-series analysis an extension to windowed aggregate functions. With any system other than Vertica, if you want to do time series analysis over different ranges or different grains you have to introduce some very painful and expensive self-joins. Julian [1] https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/AnalyzingData/TimeSeries/TIMESERIESClauseAndAggregates.htm > On Nov 8, 2022, at 4:06 AM, Dmitry Sysolyatin <[email protected]> wrote: > > Hi! > I want to implement ORDER BY WITH FILL [1] functionality. It is similar to > time_bucket_gapfill and interpolate functions from timescaledb [2]. I > didn't find a way to do it with existing functionality. > > I have an idea to introduce a PostOrder RelNode and implement > EnumerablePostOrder or extend existing Sort relnode, but it will require a > lot of changes. Maybe someone knows a more elegant way to do it ? > > [1] > https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier > [2] > https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill
