[
https://issues.apache.org/jira/browse/CALCITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598673#comment-17598673
]
Julian Hyde edited comment on CALCITE-5155 at 9/6/22 9:33 PM:
--------------------------------------------------------------
I think it is sufficient to do the following:
* Add variants of {{FLOOR}} and {{CEIL}} SQL functions that take a time unit
name as an argument. For example, {{{}FLOOR(d TO '3 hours'){}}}. The return
type is always the same as {{{}d{}}}.
* Add a variant of {{EXTRACT}} SQL function that takes a time unit name as an
argument.
* Add {{interface UserDefinedTimeUnit}} (see below) and a property
{{TIME_UNIT_FACTORY}} in
[CalciteConnectionProperty|https://calcite.apache.org/javadocAggregate/org/apache/calcite/config/CalciteConnectionProperty.html]
of type {{{}UserDefinedTimeUnit.Factory{}}}.
{code:java}
interface UserDefinedTimeUnit {
// Returns name, e.g. "3 HOUR". Should be upper case, but matching is
case-insensitive.
String name();
// Returns any other aliases.
List<String> aliasList();
// If this time unit nests within other time units, returns those time units
and their
// multipliers. For "3 HOUR", returns [(1/8, DAY), (3, HOUR)].
Map<UserDefinedTimeUnit, Rational> parent();
// Rational number (fraction).
class Rational {
public final int numerator;
public final int denominator;
}
// Built-in time units based on Avatica's enum TimeUnit
Map<String, UserDefinedTimeUnit> BUILT_IN_TIME_UNIT_MAP;
}
{code}
{*}Note 1{*}. {{FLOOR}} and {{CEIL}} are the canonical functions. There are
many other functions such as {{TIMESTAMP_TRUNC}} which can be expressed in
terms of them (both in incoming SQL, handled by the parser and validator for a
particular {{fun}} setting, and in outgoing SQL generated by the JDBC adapter).
{*}Note 2{*}. Calcite will have rules to optimize {{FLOOR(FLOOR(d TO '8 HOUR')
TO DAY)}} to {{FLOOR(d TO DAY)}} based on the parent/child relationships.
Similarly {{CEIL}} of {{{}CEIL{}}}, {{FLOOR}} of {{{}CEIL{}}}, {{CEIL}} of
{{{}FLOOR{}}}.
{*}Note 3{*}. {{FLOOR(d TO u)}} and {{CEIL(d TO u)}} are assumed monotonic in
{{d}} for all time units. Calcite may use this assumption when optimizing range
filters, sorts. They are also assumed idempotent: for all d, u, {{FLOOR(FLOOR(d
TO u) TO u)}} = {{FLOOR(CEIL(d TO u) TO u))}} = {{{}FLOOR(d TO u){}}}.
{*}Note 4{*}. Calcite uses the parent-child relationships (and relationships
inferred by transitive closure and whole-number ratios) in materialized view
matching. For example, it deduces that {{12 HOUR}} is an ancestor of {{3 HOUR}}
since both are a descendant of {{DAY}} and 3 divides 12. And therefore a query
with {{GROUP BY FLOOR(d TO '12 HOUR')}} can be answered by a materialized view
on {{{}FLOOR(d TO '3 HOUR'){}}}.
{*}Note 5{*}. Do we need a new function {{{}EXTRACT_NAME(unit FROM d){}}}?
{{EXTRACT('quarter' FROM DATE '2020-01-01')}} would return integer 1, but
{{EXTRACT_NAME('quarter' FROM DATE '2020-01-01')}} would return string 'Q1'.
Strings may depend on the current locale, and are not monotonic with respect to
{{{}d{}}}.
{*}Note 6{*}. Do we need a function {{{}LAST(d, u1, u2){}}}? For example
{{LAST(DATE '2020-04-29', 'month', 'quarter')}} returns {{{}DATE
'2020-06-01'{}}}, the (floor of) June, 2020, the last month in the quarter that
{{2020-04-29}} belongs to. Perhaps {{LAST(d, u1, u2)}} ==
{{{}DATE_ADD(DATE_ADD(FLOOR(d TO u2), 1, u2), -1, u1){}}}.
was (Author: julianhyde):
I think it is sufficient to do the following:
* Add variants of {{FLOOR}} and {{CEIL}} SQL functions that take a time unit
name as an argument. For example, {{FLOOR(d TO '3 hours')}}. The return type is
always the same as {{d}}.
* Add a variant of {{EXTRACT}} SQL function that takes a time unit name as an
argument.
* Add {{interface UserDefinedTimeUnit}} (see below) and a property
{{TIME_UNIT_FACTORY}} in
[CalciteConnectionProperty|https://calcite.apache.org/javadocAggregate/org/apache/calcite/config/CalciteConnectionProperty.html]
of type {{UserDefinedTimeUnit.Factory}}.
{code:java}
interface UserDefinedTimeUnit {
// Returns name, e.g. "3 HOUR". Should be upper case, but matching is
case-insensitive.
String name();
// Returns any other aliases.
List<String> aliasList();
// If this time unit nests within other time units, returns those time units
and their
// multipliers. For "3 HOUR", returns [(1/8, DAY), (3, HOUR)].
Map<UserDefinedTimeUnit, Rational> parent();
// Rational number (fraction).
class Rational {
public final int numerator;
public final int denominator;
}
// Built-in time units based on Avatica's enum TimeUnit
Map<String, UserDefinedTimeUnit> BUILT_IN_TIME_UNIT_MAP;
}
{code}
{*}Note 1{*}. {{FLOOR}} and {{CEIL}} are the canonical functions. There are
many other functions such as {{TIMESTAMP_TRUNC}} which can be expressed in
terms of them (both in incoming SQL, handled by the parser and validator for a
particular {{fun}} setting, and in outgoing SQL generated by the JDBC adapter).
{*}Note 2{*}. Calcite will have rules to optimize {{FLOOR(FLOOR(d TO '8 HOUR')
TO DAY)}} to {{FLOOR(d TO DAY)}} based on the parent/child relationships.
Similarly {{CEIL}} of {{CEIL}}, {{FLOOR}} of {{CEIL}}, {{CEIL}} of {{FLOOR}}.
{*}Note 3{*}. {{FLOOR(d TO u)}} and {{CEIL(d TO u)}} are assumed monotonic in
{{d}} for all time units. Calcite may use this assumption when optimizing range
filters, sorts. They are also assumed idempotent: for all d, u, {{FLOOR(FLOOR(d
TO u) TO u)}} = {{FLOOR(CEIL(d TO u) TO u))}} = {{FLOOR(d TO u)}}.
{*}Note 4{*}. Calcite uses the parent-child relationships (and relationships
inferred by transitive closure and whole-number ratios) in materialized view
matching. For example, it deduces that {{2 HOUR}} is an ancestor of {{3 HOUR}}
since both are a descendant of {{DAY}} and 3 divides 12. And therefore a query
with {{GROUP BY FLOOR(d TO '12 HOUR')}} can be answered by a materialized view
on {{FLOOR(d TO '3 HOUR')}}.
{*}Note 5{*}. Do we need a new function {{EXTRACT_NAME(unit FROM d)}}?
{{EXTRACT('quarter' FROM DATE '2020-01-01')}} would return integer 1, but
{{EXTRACT_NAME('quarter' FROM DATE '2020-01-01')}} would return string 'Q1'.
Strings may depend on the current locale, and are not monotonic with respect to
{{d}}.
{*}Note 6{*}. Do we need a function {{{}LAST(d, u1, u2){}}}? For example
{{LAST(DATE '2020-04-29', 'month', 'quarter')}} returns {{{}DATE
'2020-06-01'{}}}, the (floor of) June, 2020, the last month in the quarter that
{{2020-04-29}} belongs to. Perhaps {{LAST(d, u1, u2)}} ==
{{DATE_ADD(DATE_ADD(FLOOR(d TO u2), 1, u2), -1, u1)}}.
> Extensible time units
> ---------------------
>
> Key: CALCITE-5155
> URL: https://issues.apache.org/jira/browse/CALCITE-5155
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Priority: Major
>
> Allow a type system to define its own time units and how they are rolled up.
> Currently, time units are used in the {{EXTRACT}}, {{FLOOR}}, {{TRUNC}}
> functions, and include {{YEAR}}, {{QUARTER}}, {{MONTH}}, {{HOUR}},
> {{MINUTE}}, {{NANOSECOND}}. For example {{FLOOR(t TO HOUR)}} is valid.
> A type system would be allowed to define extra time units. Once a time unit
> is defined the {{EXTRACT}}, {{FLOOR}} and {{TRUNC}} functions should just
> work.
> The definition of might consist of a base unit and multiplier. So
> {{MINUTE15}} would be based on {{MINUTE}} with a multiplier of 15.
> Various rules know that you can roll up {{FLOOR(t TO DAY)}} to {{FLOOR(t TO
> MONTH)}} but you cannot roll {{FLOOR(t TO WEEK)}} to {{FLOOR(t TO MONTH)}}.
> When you define a new time unit, the type system can deduce that full set of
> time units that it can roll up to, and which can roll up to it.
> Should we support time units that do not evenly divide the next largest time
> unit? For example the number of seconds since the top of the hour modulo 7.
> 60 and 3,600 are not a multiples of 7, so {{SecondOfHourMod7}} would be
> different from {{SecondOfMinuteMod7}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)