I think of this topic in 3 areas:

1. how to extend SQL for interpolation;
2. how to represent interpolation in relational algebra;
3. how to implement the relational operator.

Your question was mostly about 3. I think you could add a ‘fill’ or 
‘interpolate’ relational operator that assumes a sorted input and fills in the 
gaps. You could implement it in Enumerable and use it right after an 
EnumerableSort.

Re 1. I don’t like how Clickhouse have done it by extending ORDER BY. 
Interpolation is very closely related to sampling - note that when you are 
resizing JPG images you think of upsizing and downsizing as variations of the 
same operation - and therefore should be done using a relational operator very 
similar to TABLESAMPLE.

In the relational model, data sets aren’t sorted until you print them on the 
screen. Clickhouse conflating time series with sorting is worrying, because the 
 logical next step is to build other quasi-relational operators that operate on 
sorted data, and that is not a good direction for a relational database to be 
going in. (I distinguish ’sorted data’ from ‘data that has an ordering’ as used 
in windowed aggregate functions and some other analytic functions such as rank 
and percentile.)

My favorite time-series extensions to SQL are those done by Vertica[1]. For 
most time series analysis, people don’t want to see the expanded time series, 
just compute functions over it. And therefore Vertica’s makes time-series 
analysis an extension to windowed aggregate functions. With any system other 
than Vertica, if you want to do time series analysis over different ranges or 
different grains you have to introduce some very painful and expensive 
self-joins.

Julian

[1] 
https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/AnalyzingData/TimeSeries/TIMESERIESClauseAndAggregates.htm




> On Nov 8, 2022, at 4:06 AM, Dmitry Sysolyatin <[email protected]> wrote:
> 
> Hi!
> I want to implement ORDER BY WITH FILL [1] functionality. It is similar to
> time_bucket_gapfill and  interpolate functions from timescaledb [2]. I
> didn't find a way to do it with existing functionality.
> 
> I have an idea to introduce a PostOrder RelNode and implement
> EnumerablePostOrder or extend existing Sort relnode, but it will require a
> lot of changes. Maybe someone knows a more elegant way to do it  ?
> 
> [1]
> https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier
> [2]
> https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill

Reply via email to