Dmitry, Can you please log a bug for the proposed ‘ORDER BY WITH FILL’ syntax. Link it to https://issues.apache.org/jira/browse/CALCITE-2640 <https://issues.apache.org/jira/browse/CALCITE-2640>, which is the only outstanding issue for time series support.
Julian > On Nov 8, 2022, at 10:00 AM, Julian Hyde <[email protected]> wrote: > > I think of this topic in 3 areas: > > 1. how to extend SQL for interpolation; > 2. how to represent interpolation in relational algebra; > 3. how to implement the relational operator. > > Your question was mostly about 3. I think you could add a ‘fill’ or > ‘interpolate’ relational operator that assumes a sorted input and fills in > the gaps. You could implement it in Enumerable and use it right after an > EnumerableSort. > > Re 1. I don’t like how Clickhouse have done it by extending ORDER BY. > Interpolation is very closely related to sampling - note that when you are > resizing JPG images you think of upsizing and downsizing as variations of the > same operation - and therefore should be done using a relational operator > very similar to TABLESAMPLE. > > In the relational model, data sets aren’t sorted until you print them on the > screen. Clickhouse conflating time series with sorting is worrying, because > the logical next step is to build other quasi-relational operators that > operate on sorted data, and that is not a good direction for a relational > database to be going in. (I distinguish ’sorted data’ from ‘data that has an > ordering’ as used in windowed aggregate functions and some other analytic > functions such as rank and percentile.) > > My favorite time-series extensions to SQL are those done by Vertica[1]. For > most time series analysis, people don’t want to see the expanded time series, > just compute functions over it. And therefore Vertica’s makes time-series > analysis an extension to windowed aggregate functions. With any system other > than Vertica, if you want to do time series analysis over different ranges or > different grains you have to introduce some very painful and expensive > self-joins. > > Julian > > [1] > https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/AnalyzingData/TimeSeries/TIMESERIESClauseAndAggregates.htm > > > > >> On Nov 8, 2022, at 4:06 AM, Dmitry Sysolyatin <[email protected]> >> wrote: >> >> Hi! >> I want to implement ORDER BY WITH FILL [1] functionality. It is similar to >> time_bucket_gapfill and interpolate functions from timescaledb [2]. I >> didn't find a way to do it with existing functionality. >> >> I have an idea to introduce a PostOrder RelNode and implement >> EnumerablePostOrder or extend existing Sort relnode, but it will require a >> lot of changes. Maybe someone knows a more elegant way to do it ? >> >> [1] >> https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier >> [2] >> https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill >
