paul-rogers opened a new issue #11587: URL: https://github.com/apache/druid/issues/11587
`TIME_FLOOR` does not work, documentation is incomplete ### Affected Version Version: 0.21. ### Description Consider the "stock" Wikipedia data source and stock Docker Druid cluster. The `TIME_FLOOR` function (and documentation) exhibit a number of problems. My goal is to round the `__time` column to one hour, eventually to `n` hours (or minutes, etc.) ### Periods in Documentation Should Be Quoted The [documentation](https://druid.apache.org/docs/latest/querying/sql.html#time-functions) describes the `period` argument as: > Period can be any ISO8601 period, like P3M (quarters) or PT12H (half-days). Note that the period examples are unquoted. By contrast, `DATE_TRUNC` says: > Unit can be 'milliseconds', 'second', 'minute', 'hour', 'day', 'week', 'month', 'quarter', 'year', 'decade', 'century', or 'millennium'. That is, the examples are quoted. So, it is not clear if the `P1H` should be unquoted (as with `FLOOR`) or quoted. Let's try. ```sql SELECT TIME_FLOOR(__time, 'P1H') AS hr, channel, page FROM "wikiticker-2015-09-12-sampled" ``` Error: ```text Error: Unknown exception Error while applying rule DruidQueryRule(SELECT_PROJECT), args [rel#10765:LogicalProject.NONE.[](input=RelSubset#10764,hr=TIME_FLOOR($0, 'P1H'),channel=$2,page=$16), rel#10775:DruidQueryRel.NONE.[](query={"queryType":"scan","dataSource":{"type":"table","name":"wikiticker-2015-09-12-sampled"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"resultFormat":"compactedList","batchSize":20480,"order":"none","filter":null,"columns":["__time","added","channel","cityName","comment","countryIsoCode","countryName","deleted","delta","isAnonymous","isMinor","isNew","isRobot","isUnpatrolled","metroCode","namespace","page","regionIsoCode","regionName","user"],"legacy":false,"context":{"sqlOuterLimit":100,"sqlQueryId":"eefb68de-7cd7-47e8-8056-bd28c8d2296c"},"descending":false,"granularity":{"type":"all"}},signature={__time:LONG, added:LONG, channel:STRING, cityName:STRING, comment:STRING, countryIsoCode:STRING, coun tryName:STRING, deleted:LONG, delta:LONG, isAnonymous:STRING, isMinor:STRING, isNew:STRING, isRobot:STRING, isUnpatrolled:STRING, metroCode:STRING, namespace:STRING, page:STRING, regionIsoCode:STRING, regionName:STRING, user:STRING})] java.lang.RuntimeException ``` Maybe unquoted? ```sql SELECT TIME_FLOOR(__time, P1H) AS hr, channel, page FROM "wikiticker-2015-09-12-sampled" ``` Definitely not. Error: ```text Error: Plan validation failed org.apache.calcite.runtime.CalciteContextException: From line 2, column 22 to line 2, column 24: Column 'P1H' not found in any table org.apache.calcite.tools.ValidationException ``` ### Role of Time Zone is Unclear The documentation states: > Rounds down a timestamp, returning it as a new timestamp. ... The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00". This function is similar to CEIL but is more flexible. There is no description of what role the timezone plays. Here is one interpretation: > The input and output values are UTC-based time zones. The timestamp is converted to the given timezone when computing periods of a day or greater: the local timezone determines when midnight occurs. It might be worth adding an example, since time conversions are quite confusing. Suppose I have a PST time of `2012-08-12T17:00:00 -08:00`. Convert this to UTC and get `2012-08-12T01:00:00Z` in the data. Give `TIME_FLOOR` a time zone of `-08:00` and we get our PST time back. We use `P1D` to truncate this to `2012-08-12T17:00:00 -08:00`. Now, convert it back to UTC and we get `2012-08-11T16:00:00Z`. Thus, though we've rounded to one day in the local time zone, the hours will be non-zero for the UTC time zone (unless the local time zone is UTC.) ### `TIME_FLOOR` Fails for Two Arguments This still leaves no explanation for why the first example failed. The [documentation](https://druid.apache.org/docs/latest/querying/sql.html#time-functions) describes `TIME_FLOOR` as: > `TIME_FLOOR(<timestamp_expr>, <period>, [<origin>, [<timezone>]])` Yet, our example with two arguments fails. ### Internal Function Signature Does Not Match Documentation Let's try something else, give the function a timezone of `'UTC'` to see if that makes it happy: ```sql SELECT TIME_FLOOR(__time, 'P1H', 'UTC') AS hr, channel, page FROM "wikiticker-2015-09-12-sampled" ``` Error: ```text Error: Plan validation failed org.apache.calcite.runtime.CalciteContextException: From line 2, column 3 to line 2, column 34: Cannot apply 'TIME_FLOOR' to arguments of type 'TIME_FLOOR(<TIMESTAMP(3)>, <CHAR(3)>, <CHAR(3)>)'. Supported form(s): 'TIME_FLOOR(<TIMESTAMP>, <CHARACTER>, <TIMESTAMP>, <CHARACTER>)' org.apache.calcite.tools.ValidationException ``` Notice that that the third and fourth arguments do not match the documentation: > `TIME_FLOOR(<timestamp_expr>, <period>, [<origin>, [<timezone>]])` ### Testing Suggestion This appears to be a case in which the function would benefit from a robust set of unit tests which exercise the options given in the documentation. As it is, it seems that `TIME_FLOOR` is basically unusable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
