andygrove opened a new issue, #4540:
URL: https://github.com/apache/datafusion-comet/issues/4540

   ## What is the problem the feature request solves?
   
   Comet has no support for Spark's interval data types:
   
   - `CalendarIntervalType` (months + days + microseconds)
   - `YearMonthIntervalType` (ANSI `INTERVAL YEAR TO MONTH`)
   - `DayTimeIntervalType` (ANSI `INTERVAL DAY TO SECOND`)
   
   Because the types are unsupported, every expression that produces or 
consumes an interval falls back to Spark, and any query carrying an interval 
column through a Comet operator falls back as well. 
`CometBatchKernelCodegen.isSupportedDataType` also rejects these types, so they 
cannot even be routed through the JVM codegen dispatcher (see #4506 / #4538): 
the interval expressions are a genuine arrow-native gap with no stopgap.
   
   This issue tracks the foundational type support plus the dependent 
expression family. It is the prerequisite for the already-filed per-expression 
requests below.
   
   ## Describe the potential solution
   
   ### Type support (prerequisite)
   
   - Map the three Spark interval types to Arrow:
     - `YearMonthIntervalType` -> Arrow `Interval(YearMonth)`
     - `DayTimeIntervalType` -> Arrow `Interval(MonthDayNano)` / `Duration` 
(decide representation that round-trips with Spark's microsecond storage)
     - `CalendarIntervalType` -> Arrow `Interval(MonthDayNano)` (Spark stores 
months/days/micros)
   - Wire the types through the `CometVector` hierarchy, FFI import/export 
(`NativeUtil` / `scan.rs`), and `serializeDataType` in `QueryPlanSerde`.
   - Allow these types in `CometBatchKernelCodegen.isSupportedDataType` once 
the FFI path is correct, so codegen dispatch can also cover interval 
expressions.
   
   ### Expressions (depend on the type work)
   
   Constructors and arithmetic already tracked individually:
   
   - `make_interval` (#3099), `make_dt_interval` (#3098), `make_ym_interval` 
(#3100), `try_make_interval` (#3103)
   - `multiply_ym_interval` (#3102), `multiply_dt_interval` (#3101), 
`divide_dt_interval` (#3096)
   - `date_add_interval` (#3086), `timestamp_add_interval` (#3114), 
`timestamp_add_ym_interval` (#3115), `time_add_interval` (#3121)
   - `subtract_timestamps` (#3112), `subtract_dates` (#3094), `subtract_times` 
(#3139), `datetime_sub` (#3134), `datetime_add`
   - `extract` / `date_part` of interval fields
   
   (The list of per-expression issues is derived from the `// datetime 
functions` section of `FunctionRegistry`; this umbrella should be linked from 
each.)
   
   ## Additional context
   
   - Related: #4418 ([EPIC] date/time expressions), #4506 ([EPIC] codegen 
dispatch for Incompatible expressions).
   - Until the type work lands, none of the interval expressions can be made 
arrow-native or dispatched; they must fall back.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to