paul-rogers opened a new issue #11587:
URL: https://github.com/apache/druid/issues/11587


   `TIME_FLOOR` does not work, documentation is incomplete
   
   ### Affected Version
   
   Version: 0.21.
   
   ### Description
   
   Consider the "stock" Wikipedia data source and stock Docker Druid cluster. 
The `TIME_FLOOR` function (and documentation) exhibit a number of problems.
   
   My goal is to round the `__time` column to one hour, eventually to `n` hours 
(or minutes, etc.)
   
   ### Periods in Documentation Should Be Quoted
   
   The 
[documentation](https://druid.apache.org/docs/latest/querying/sql.html#time-functions)
 describes the `period` argument as:
   
   > Period can be any ISO8601 period, like P3M (quarters) or PT12H 
(half-days). 
   
   Note that the period examples are unquoted. By contrast, `DATE_TRUNC` says:
   
   > Unit can be 'milliseconds', 'second', 'minute', 'hour', 'day', 'week', 
'month', 'quarter', 'year', 'decade', 'century', or 'millennium'.
   
   That is, the examples are quoted. So, it is not clear if the `P1H` should be 
unquoted (as with `FLOOR`) or quoted. Let's try.
   
   ```sql
   SELECT
     TIME_FLOOR(__time, 'P1H') AS hr,
     channel,
     page
   FROM "wikiticker-2015-09-12-sampled"
   ```
   
   Error:
   
   ```text
   Error: Unknown exception
   
   Error while applying rule DruidQueryRule(SELECT_PROJECT), args 
[rel#10765:LogicalProject.NONE.[](input=RelSubset#10764,hr=TIME_FLOOR($0, 
'P1H'),channel=$2,page=$16), 
rel#10775:DruidQueryRel.NONE.[](query={"queryType":"scan","dataSource":{"type":"table","name":"wikiticker-2015-09-12-sampled"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"resultFormat":"compactedList","batchSize":20480,"order":"none","filter":null,"columns":["__time","added","channel","cityName","comment","countryIsoCode","countryName","deleted","delta","isAnonymous","isMinor","isNew","isRobot","isUnpatrolled","metroCode","namespace","page","regionIsoCode","regionName","user"],"legacy":false,"context":{"sqlOuterLimit":100,"sqlQueryId":"eefb68de-7cd7-47e8-8056-bd28c8d2296c"},"descending":false,"granularity":{"type":"all"}},signature={__time:LONG,
 added:LONG, channel:STRING, cityName:STRING, comment:STRING, 
countryIsoCode:STRING, coun
 tryName:STRING, deleted:LONG, delta:LONG, isAnonymous:STRING, isMinor:STRING, 
isNew:STRING, isRobot:STRING, isUnpatrolled:STRING, metroCode:STRING, 
namespace:STRING, page:STRING, regionIsoCode:STRING, regionName:STRING, 
user:STRING})]
   
   java.lang.RuntimeException
   ```
   
   Maybe unquoted?
   
   ```sql
   SELECT
     TIME_FLOOR(__time, P1H) AS hr,
     channel,
     page
   FROM "wikiticker-2015-09-12-sampled"
   ```
   
   Definitely not. Error:
   
   ```text
   Error: Plan validation failed
   
   org.apache.calcite.runtime.CalciteContextException: From line 2, column 22 
to line 2, column 24: Column 'P1H' not found in any table
   
   org.apache.calcite.tools.ValidationException
   ```
   
   ### Role of Time Zone is Unclear
   
   The documentation states:
   
   > Rounds down a timestamp, returning it as a new timestamp. ... The time 
zone, if provided, should be a time zone name like "America/Los_Angeles" or 
offset like "-08:00". This function is similar to CEIL but is more flexible.
   
   There is no description of what role the timezone plays. Here is one 
interpretation:
   
   > The input and output values are UTC-based time zones. The timestamp is 
converted to the given timezone when computing periods of a day or greater: the 
local timezone determines when midnight occurs.
   
   It might be worth adding an example, since time conversions are quite 
confusing.
   
   Suppose I have a PST time of `2012-08-12T17:00:00 -08:00`. Convert this to 
UTC and get `2012-08-12T01:00:00Z` in the data. Give `TIME_FLOOR` a time zone 
of `-08:00` and we get our PST time back. We use `P1D` to truncate this to 
`2012-08-12T17:00:00 -08:00`. Now, convert it back to UTC and we get 
`2012-08-11T16:00:00Z`.
   
   Thus, though we've rounded to one day in the local time zone, the hours will 
be non-zero for the UTC time zone (unless the local time zone is UTC.)
   
   ### `TIME_FLOOR` Fails for Two Arguments
   
   This still leaves no explanation for why the first example failed.
   
   The 
[documentation](https://druid.apache.org/docs/latest/querying/sql.html#time-functions)
 describes `TIME_FLOOR` as:
   
   > `TIME_FLOOR(<timestamp_expr>, <period>, [<origin>, [<timezone>]])`
   
   Yet, our example with two arguments fails.
   
   ### Internal Function Signature Does Not Match Documentation
   
   Let's try something else, give the function a timezone of `'UTC'` to see if 
that makes it happy:
   
   ```sql
   SELECT
     TIME_FLOOR(__time, 'P1H', 'UTC') AS hr,
     channel,
     page
   FROM "wikiticker-2015-09-12-sampled"
   ```
   
   Error:
   
   ```text
   Error: Plan validation failed
   
   org.apache.calcite.runtime.CalciteContextException: From line 2, column 3 to 
line 2, column 34: 
   Cannot apply 'TIME_FLOOR' to arguments of type 
   'TIME_FLOOR(<TIMESTAMP(3)>, <CHAR(3)>, <CHAR(3)>)'. 
   Supported form(s): 'TIME_FLOOR(<TIMESTAMP>, <CHARACTER>, <TIMESTAMP>, 
<CHARACTER>)'
   
   org.apache.calcite.tools.ValidationException
   ```
   
   Notice that that the third and fourth arguments do not match the 
documentation:
   
   > `TIME_FLOOR(<timestamp_expr>, <period>, [<origin>, [<timezone>]])`
   
   ### Testing Suggestion
   
   This appears to be a case in which the function would benefit from a robust 
set of unit tests which exercise the options given in the documentation.
   
   As it is, it seems that `TIME_FLOOR` is basically unusable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to