Julian, Good point, I didn't think about it. I have created jira ticket - https://issues.apache.org/jira/browse/CALCITE-5375
On Tue, Nov 8, 2022 at 10:02 PM Julian Hyde <[email protected]> wrote: > Dmitry, > > Can you please log a bug for the proposed ‘ORDER BY WITH FILL’ syntax. > Link it to https://issues.apache.org/jira/browse/CALCITE-2640 < > https://issues.apache.org/jira/browse/CALCITE-2640>, which is the only > outstanding issue for time series support. > > Julian > > > > > On Nov 8, 2022, at 10:00 AM, Julian Hyde <[email protected]> wrote: > > > > I think of this topic in 3 areas: > > > > 1. how to extend SQL for interpolation; > > 2. how to represent interpolation in relational algebra; > > 3. how to implement the relational operator. > > > > Your question was mostly about 3. I think you could add a ‘fill’ or > ‘interpolate’ relational operator that assumes a sorted input and fills in > the gaps. You could implement it in Enumerable and use it right after an > EnumerableSort. > > > > Re 1. I don’t like how Clickhouse have done it by extending ORDER BY. > Interpolation is very closely related to sampling - note that when you are > resizing JPG images you think of upsizing and downsizing as variations of > the same operation - and therefore should be done using a relational > operator very similar to TABLESAMPLE. > > > > In the relational model, data sets aren’t sorted until you print them on > the screen. Clickhouse conflating time series with sorting is worrying, > because the logical next step is to build other quasi-relational operators > that operate on sorted data, and that is not a good direction for a > relational database to be going in. (I distinguish ’sorted data’ from ‘data > that has an ordering’ as used in windowed aggregate functions and some > other analytic functions such as rank and percentile.) > > > > My favorite time-series extensions to SQL are those done by Vertica[1]. > For most time series analysis, people don’t want to see the expanded time > series, just compute functions over it. And therefore Vertica’s makes > time-series analysis an extension to windowed aggregate functions. With any > system other than Vertica, if you want to do time series analysis over > different ranges or different grains you have to introduce some very > painful and expensive self-joins. > > > > Julian > > > > [1] > https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/AnalyzingData/TimeSeries/TIMESERIESClauseAndAggregates.htm > > > > > > > > > >> On Nov 8, 2022, at 4:06 AM, Dmitry Sysolyatin <[email protected]> > wrote: > >> > >> Hi! > >> I want to implement ORDER BY WITH FILL [1] functionality. It is similar > to > >> time_bucket_gapfill and interpolate functions from timescaledb [2]. I > >> didn't find a way to do it with existing functionality. > >> > >> I have an idea to introduce a PostOrder RelNode and implement > >> EnumerablePostOrder or extend existing Sort relnode, but it will > require a > >> lot of changes. Maybe someone knows a more elegant way to do it ? > >> > >> [1] > >> > https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier > >> [2] > >> > https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill > > > >
