Hi Team,
I’d like to start a discussion around the current design of the INTERVAL
type in AsterixDB and propose splitting it into three distinct types:
-
INTERVAL_DATE
-
INTERVAL_TIME
-
INTERVAL_DATETIME
*Background*
Today, INTERVAL is effectively an overloaded type whose semantics depend on
the underlying endpoint types (DATE, TIME, or DATETIME). This is visible,
for example, in AIntervalConstructorDescriptor, where the interval’s
behavior and internal representation are determined dynamically based on
the serialized type tag of the inputs:
```
switch (intervalType) {
case DATE:
intervalStart = ADateSerializerDeserializer.getChronon(...);
intervalEnd = ADateSerializerDeserializer.getChronon(...);
break;
case TIME:
intervalStart = ATimeSerializerDeserializer.getChronon(...);
intervalEnd = ATimeSerializerDeserializer.getChronon(...);
break;
case DATETIME:
intervalStart = ADateTimeSerializerDeserializer.getChronon(...);
intervalEnd = ADateTimeSerializerDeserializer.getChronon(...);
break;
...
}
```
As a result:
-
A single INTERVAL type can represent *date intervals*, *time intervals*,
or *datetime intervals*
-
The physical width of endpoints differs (DATE/TIME are 4 bytes, DATETIME
is 8 bytes)
-
Semantics such as ordering, comparison, and statistics are inherently
type-dependent
*Motivation*
This overloading creates several challenges:
1.
*Comparability and ordering*
-
Intervals are only meaningfully comparable when their endpoint
domains match
-
A generic INTERVAL type prevents us from expressing this at the type
level
2.
*Optimizer & storage implications*
-
Min/max statistics and ordering assumptions are unclear or unsafe for
mixed-interval semantics
-
Filter pushdown and reasoning become more complex than necessary
3.
*Type safety & clarity*
-
The interval’s actual semantics are implicit, not explicit
Conceptually, INTERVAL today behaves like three distinct types sharing a
constructor, rather than a single coherent type.
*Proposal*
Introduce three explicit interval types:
-
INTERVAL_DATE → interval between DATE values
-
INTERVAL_TIME → interval between TIME values
-
INTERVAL_DATETIME → interval between DATETIME values
Each would:
-
Have well-defined ordering and comparison semantics within its domain
-
Make type errors visible earlier and simplify reasoning across the engine
I’m happy to prototype if there’s agreement on the direction.
Looking forward to feedback and discussion.
Best regards,
Ritik