Dmitry,

Can you please log a bug for the proposed ‘ORDER BY WITH FILL’ syntax. Link it 
to https://issues.apache.org/jira/browse/CALCITE-2640 
<https://issues.apache.org/jira/browse/CALCITE-2640>, which is the only 
outstanding issue for time series support.

Julian



> On Nov 8, 2022, at 10:00 AM, Julian Hyde <[email protected]> wrote:
> 
> I think of this topic in 3 areas:
> 
> 1. how to extend SQL for interpolation;
> 2. how to represent interpolation in relational algebra;
> 3. how to implement the relational operator.
> 
> Your question was mostly about 3. I think you could add a ‘fill’ or 
> ‘interpolate’ relational operator that assumes a sorted input and fills in 
> the gaps. You could implement it in Enumerable and use it right after an 
> EnumerableSort.
> 
> Re 1. I don’t like how Clickhouse have done it by extending ORDER BY. 
> Interpolation is very closely related to sampling - note that when you are 
> resizing JPG images you think of upsizing and downsizing as variations of the 
> same operation - and therefore should be done using a relational operator 
> very similar to TABLESAMPLE.
> 
> In the relational model, data sets aren’t sorted until you print them on the 
> screen. Clickhouse conflating time series with sorting is worrying, because 
> the  logical next step is to build other quasi-relational operators that 
> operate on sorted data, and that is not a good direction for a relational 
> database to be going in. (I distinguish ’sorted data’ from ‘data that has an 
> ordering’ as used in windowed aggregate functions and some other analytic 
> functions such as rank and percentile.)
> 
> My favorite time-series extensions to SQL are those done by Vertica[1]. For 
> most time series analysis, people don’t want to see the expanded time series, 
> just compute functions over it. And therefore Vertica’s makes time-series 
> analysis an extension to windowed aggregate functions. With any system other 
> than Vertica, if you want to do time series analysis over different ranges or 
> different grains you have to introduce some very painful and expensive 
> self-joins.
> 
> Julian
> 
> [1] 
> https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/AnalyzingData/TimeSeries/TIMESERIESClauseAndAggregates.htm
> 
> 
> 
> 
>> On Nov 8, 2022, at 4:06 AM, Dmitry Sysolyatin <[email protected]> 
>> wrote:
>> 
>> Hi!
>> I want to implement ORDER BY WITH FILL [1] functionality. It is similar to
>> time_bucket_gapfill and  interpolate functions from timescaledb [2]. I
>> didn't find a way to do it with existing functionality.
>> 
>> I have an idea to introduce a PostOrder RelNode and implement
>> EnumerablePostOrder or extend existing Sort relnode, but it will require a
>> lot of changes. Maybe someone knows a more elegant way to do it  ?
>> 
>> [1]
>> https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier
>> [2]
>> https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill
> 

Reply via email to