Re: Implementation of ORDER BY WITH FILL [INTERPOLATE]

Dmitry Sysolyatin Thu, 10 Nov 2022 06:24:13 -0800

Julian,

Good point, I didn't think about it. I have created jira ticket -
https://issues.apache.org/jira/browse/CALCITE-5375


On Tue, Nov 8, 2022 at 10:02 PM Julian Hyde <[email protected]> wrote:

> Dmitry,
>
> Can you please log a bug for the proposed ‘ORDER BY WITH FILL’ syntax.
> Link it to https://issues.apache.org/jira/browse/CALCITE-2640 <
> https://issues.apache.org/jira/browse/CALCITE-2640>, which is the only
> outstanding issue for time series support.
>
> Julian
>
>
>
> > On Nov 8, 2022, at 10:00 AM, Julian Hyde <[email protected]> wrote:
> >
> > I think of this topic in 3 areas:
> >
> > 1. how to extend SQL for interpolation;
> > 2. how to represent interpolation in relational algebra;
> > 3. how to implement the relational operator.
> >
> > Your question was mostly about 3. I think you could add a ‘fill’ or
> ‘interpolate’ relational operator that assumes a sorted input and fills in
> the gaps. You could implement it in Enumerable and use it right after an
> EnumerableSort.
> >
> > Re 1. I don’t like how Clickhouse have done it by extending ORDER BY.
> Interpolation is very closely related to sampling - note that when you are
> resizing JPG images you think of upsizing and downsizing as variations of
> the same operation - and therefore should be done using a relational
> operator very similar to TABLESAMPLE.
> >
> > In the relational model, data sets aren’t sorted until you print them on
> the screen. Clickhouse conflating time series with sorting is worrying,
> because the  logical next step is to build other quasi-relational operators
> that operate on sorted data, and that is not a good direction for a
> relational database to be going in. (I distinguish ’sorted data’ from ‘data
> that has an ordering’ as used in windowed aggregate functions and some
> other analytic functions such as rank and percentile.)
> >
> > My favorite time-series extensions to SQL are those done by Vertica[1].
> For most time series analysis, people don’t want to see the expanded time
> series, just compute functions over it. And therefore Vertica’s makes
> time-series analysis an extension to windowed aggregate functions. With any
> system other than Vertica, if you want to do time series analysis over
> different ranges or different grains you have to introduce some very
> painful and expensive self-joins.
> >
> > Julian
> >
> > [1]
> https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/AnalyzingData/TimeSeries/TIMESERIESClauseAndAggregates.htm
> >
> >
> >
> >
> >> On Nov 8, 2022, at 4:06 AM, Dmitry Sysolyatin <[email protected]>
> wrote:
> >>
> >> Hi!
> >> I want to implement ORDER BY WITH FILL [1] functionality. It is similar
> to
> >> time_bucket_gapfill and  interpolate functions from timescaledb [2]. I
> >> didn't find a way to do it with existing functionality.
> >>
> >> I have an idea to introduce a PostOrder RelNode and implement
> >> EnumerablePostOrder or extend existing Sort relnode, but it will
> require a
> >> lot of changes. Maybe someone knows a more elegant way to do it  ?
> >>
> >> [1]
> >>
> https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#order-by-expr-with-fill-modifier
> >> [2]
> >>
> https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/time_bucket_gapfill
> >
>
>

Re: Implementation of ORDER BY WITH FILL [INTERPOLATE]

Reply via email to