Abdullahsab3 commented on issue #10069:
URL: https://github.com/apache/datafusion/issues/10069#issuecomment-2373531351
I think switching `generate_series` to a UDTF will break the API. I think in
most cases it's an acceptable API change. However there is (at least for me) 1
particular use case that of the array generated from `generate_series` that
will probably need closer attention.
<details>
<summary>Example</summary>
I need to spread the data equally over equal intervals between 2 data
points. I use `generate_series` to generate an array per row. The way that I do
this now is as follows:
```sql
create table manual(time timestamp, index_value DOUBLE);
insert into
manual (time, index_value)
values
('2022-01-31 23:00:00.0000000', 935035.000),
('2022-02-28 23:00:00.0000000', 954469.000),
('2022-03-31 22:00:00.0000000', 905204.000),
('2022-04-30 22:00:00.0000000', 971415.000);
```
```sql
with
index_values AS (
select
time AS to_time,
lag(time, 1) over (order by time) AS from_time,
index_value AS to_index,
lag(index_value, 1) over (order by time) AS from_index
from manual
),
series AS (
select
unnest(generate_series(from_time, to_time, interval '1 month'))
AS time,
(case
when to_index < from_index THEN (999999 - from_index) +
to_index
else to_index - from_index end)
/ coalesce(nullif(array_length(generate_series(from_time,
to_time, interval '1 month')), 0), 1) AS delta_value
from index_values
) SELECT * FROM series;
```
Specifically `array_length(generate_series(from_time, to_time, interval '1
month'))` will need to be done differently if `generate_series` becomes a UDTF.
It will probably need to become `COUNT(time) OVER (partition by to_time,
from_time)` :
```sql
with
index_values AS (
select
time AS to_time,
lag(time, 1) over (order by time) AS from_time,
index_value AS to_index,
lag(index_value, 1) over (order by time) AS from_index
from manual
),
series AS (
select
time,
(case
when to_index < from_index THEN (999999 - from_index) +
to_index
else to_index - from_index end)
/ coalesce(nullif(count(time) over (partition by to_time,
from_time), 0), 1) AS delta_value
from index_values, generate_series(from_time, to_time, interval '900
seconds') AS time
) SELECT * FROM series;
```
(If you have other/better suggestions for how data can be
spreaded/interpolated in raw DF, all ears :) )
</details>
TL;DR: If you're using `generate_series` to generate a series on a row-based
basis, you may need to consider `OVER PARTITION BY (your_row_composite_key)` if
this API change takes place
Just mentioning this here in case anyone is using `generate_series` in a
similar way as I do
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]