andygrove opened a new issue, #4540:
URL: https://github.com/apache/datafusion-comet/issues/4540
## What is the problem the feature request solves?
Comet has no support for Spark's interval data types:
- `CalendarIntervalType` (months + days + microseconds)
- `YearMonthIntervalType` (ANSI `INTERVAL YEAR TO MONTH`)
- `DayTimeIntervalType` (ANSI `INTERVAL DAY TO SECOND`)
Because the types are unsupported, every expression that produces or
consumes an interval falls back to Spark, and any query carrying an interval
column through a Comet operator falls back as well.
`CometBatchKernelCodegen.isSupportedDataType` also rejects these types, so they
cannot even be routed through the JVM codegen dispatcher (see #4506 / #4538):
the interval expressions are a genuine arrow-native gap with no stopgap.
This issue tracks the foundational type support plus the dependent
expression family. It is the prerequisite for the already-filed per-expression
requests below.
## Describe the potential solution
### Type support (prerequisite)
- Map the three Spark interval types to Arrow:
- `YearMonthIntervalType` -> Arrow `Interval(YearMonth)`
- `DayTimeIntervalType` -> Arrow `Interval(MonthDayNano)` / `Duration`
(decide representation that round-trips with Spark's microsecond storage)
- `CalendarIntervalType` -> Arrow `Interval(MonthDayNano)` (Spark stores
months/days/micros)
- Wire the types through the `CometVector` hierarchy, FFI import/export
(`NativeUtil` / `scan.rs`), and `serializeDataType` in `QueryPlanSerde`.
- Allow these types in `CometBatchKernelCodegen.isSupportedDataType` once
the FFI path is correct, so codegen dispatch can also cover interval
expressions.
### Expressions (depend on the type work)
Constructors and arithmetic already tracked individually:
- `make_interval` (#3099), `make_dt_interval` (#3098), `make_ym_interval`
(#3100), `try_make_interval` (#3103)
- `multiply_ym_interval` (#3102), `multiply_dt_interval` (#3101),
`divide_dt_interval` (#3096)
- `date_add_interval` (#3086), `timestamp_add_interval` (#3114),
`timestamp_add_ym_interval` (#3115), `time_add_interval` (#3121)
- `subtract_timestamps` (#3112), `subtract_dates` (#3094), `subtract_times`
(#3139), `datetime_sub` (#3134), `datetime_add`
- `extract` / `date_part` of interval fields
(The list of per-expression issues is derived from the `// datetime
functions` section of `FunctionRegistry`; this umbrella should be linked from
each.)
## Additional context
- Related: #4418 ([EPIC] date/time expressions), #4506 ([EPIC] codegen
dispatch for Incompatible expressions).
- Until the type work lands, none of the interval expressions can be made
arrow-native or dispatched; they must fall back.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]