Abdullahsab3 commented on issue #10069: URL: https://github.com/apache/datafusion/issues/10069#issuecomment-2373531351
I think switching `generate_series` to a UDTF will break the API. I think in most cases it's an acceptable API change. However there is (at least for me) 1 particular use case that of the array generated from `generate_series` that will probably need closer attention. <details> <summary>Example</summary> I need to spread the data equally over equal intervals between 2 data points. I use `generate_series` to generate an array per row. The way that I do this now is as follows: ```sql create table manual(time timestamp, index_value DOUBLE); insert into manual (time, index_value) values ('2022-01-31 23:00:00.0000000', 935035.000), ('2022-02-28 23:00:00.0000000', 954469.000), ('2022-03-31 22:00:00.0000000', 905204.000), ('2022-04-30 22:00:00.0000000', 971415.000); ``` ```sql with index_values AS ( select time AS to_time, lag(time, 1) over (order by time) AS from_time, index_value AS to_index, lag(index_value, 1) over (order by time) AS from_index from manual ), series AS ( select unnest(generate_series(from_time, to_time, interval '1 month')) AS time, (case when to_index < from_index THEN (999999 - from_index) + to_index else to_index - from_index end) / coalesce(nullif(array_length(generate_series(from_time, to_time, interval '1 month')), 0), 1) AS delta_value from index_values ) SELECT * FROM series; ``` Specifically `array_length(generate_series(from_time, to_time, interval '1 month'))` will need to be done differently if `generate_series` becomes a UDTF. It will probably need to become `COUNT(time) OVER (partition by to_time, from_time)` : ```sql with index_values AS ( select time AS to_time, lag(time, 1) over (order by time) AS from_time, index_value AS to_index, lag(index_value, 1) over (order by time) AS from_index from manual ), series AS ( select time, (case when to_index < from_index THEN (999999 - from_index) + to_index else to_index - from_index end) / coalesce(nullif(count(time) over (partition by to_time, from_time), 0), 1) AS delta_value from index_values, generate_series(from_time, to_time, interval '900 seconds') AS time ) SELECT * FROM series; ``` (If you have other/better suggestions for how data can be spreaded/interpolated in raw DF, all ears :) ) </details> TL;DR: If you're using `generate_series` to generate a series on a row-based basis, you may need to consider `OVER PARTITION BY (your_row_composite_key)` if this API change takes place Just mentioning this here in case anyone is using `generate_series` in a similar way as I do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org