jacobmarble commented on issue #3015: URL: https://github.com/apache/arrow-datafusion/issues/3015#issuecomment-1204739188
I don't know how far DataFusion likes to deviate from PostgreSQL-flavored SQL, but here's an idea for this feature. ### Suggestion 1 Make the default value of the `origin` argument `1970-01-01 00:00:00 UTC`. It seems there is no default value specified by PostgreSQL. Certainly there exists a need for `origin` values like `XXXX-XX-XX 00:02:30 UTC`, but the far more common uses of `date_bin` are certainly with `stride` values like 1s, 10s, 30s, 1m, 5m, 10m, 15m, 20m, 30m, 1h, 2h, 3h, 6h, etc, with the reasonable expectation that the result of `date_bin` align with a top-of-the-hour origin. The result is that 99% of queries written with this function will be shorter and less error prone. Some proposed, equivalent statements: ```sql -- as specified by PostgreSQL - origin at Unix epoch select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp '1970-01-01 00:00:00 UTC'); -- 2022-08-03 14:45:00 -- proposed select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST'); ``` ### Suggestion 2 Allow `origin` values with type `timestamp`, `date`, `time` or `interval`. Again, PostgreSQL doesn't have any prior art here. The result enables better readability. Some proposed, equivalent statements: ```sql -- as specified by PostgreSQL - origin shifts bins forward 5 minutes select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp '1970-01-01 00:05:00 UTC'); -- 2022-08-03 14:35:00 -- proposed select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', time '00:05:00'); select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', interval '5 minutes'); ``` ```sql -- as specified by PostgreSQL - origin shifts bins back 5 minutes select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', timestamp '1969-12-31 23:55:00 UTC'); -- 2022-08-03 14:40:00 -- proposed select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', - time '00:05:00'); select date_bin('15 minutes', timestamp '2022-08-03 14:49:50 PST', interval '- 5 minutes'); ``` ```sql -- as specified by PostgreSQL - origin shifts bins forward one day select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', timestamp '1970-01-02 00:00:00 UTC'); -- 2022-07-29 00:00:00 -- proposed select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', date '1970-01-02'); select date_bin('7 days', timestamp '2022-08-03 14:49:50 PST', interval '1 day'); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org