Re: [PR] feat: support interval types and make_ym_interval / make_dt_interval [datafusion-comet]

via GitHub Wed, 01 Jul 2026 08:45:05 -0700


mbutrovich commented on code in PR #4541:
URL: https://github.com/apache/datafusion-comet/pull/4541#discussion_r3507285809



##########
spark/src/main/scala/org/apache/spark/sql/comet/util/Utils.scala:
##########
@@ -108,6 +108,7 @@ object Utils extends CometTypeShim with Logging {
     case yi: ArrowType.Interval if yi.getUnit == IntervalUnit.YEAR_MONTH =>
       YearMonthIntervalType()
     case di: ArrowType.Interval if di.getUnit == IntervalUnit.DAY_TIME => 
DayTimeIntervalType()
+    case d: ArrowType.Duration if d.getUnit == TimeUnit.MICROSECOND => 
DayTimeIntervalType()

Review Comment:
   Observation, no change requested. `fromArrowType` now maps both 
`Interval(DayTime)` (line 110) and `Duration(Microsecond)` (line 111) to 
`DayTimeIntervalType`, while `toArrowType` only ever emits 
`Duration(Microsecond)`. So the `Interval(DayTime)` reverse case is dead for 
Comet-produced data and is presumably kept for externally-produced Arrow. That 
is fine and consistent with the precision rationale, just flagging the 
asymmetry in case it was unintentional.



##########
spark/src/main/scala/org/apache/spark/sql/comet/util/Utils.scala:
##########
@@ -151,6 +152,10 @@ object Utils extends CometTypeShim with Logging {
       case NullType => ArrowType.Null.INSTANCE
       case dt if isTimeType(dt) =>
         new ArrowType.Time(TimeUnit.NANOSECOND, 64)
+      case _: YearMonthIntervalType => new 
ArrowType.Interval(IntervalUnit.YEAR_MONTH)
+      // Spark stores DayTimeIntervalType as microseconds in an int64, 
matching Arrow
+      // Duration(Microsecond) rather than the lossy Interval(DayTime) {days, 
millis} layout.
+      case _: DayTimeIntervalType => new 
ArrowType.Duration(TimeUnit.MICROSECOND)

Review Comment:
   Verified correct, no change needed for this PR. The representation choices 
match Spark's internal storage exactly: `YearMonthIntervalType` is int32 
months, which maps cleanly to `Interval(YearMonth)`; `DayTimeIntervalType` is 
an int64 microsecond count, which round-trips faithfully through 
`Duration(Microsecond)`. The comment correctly notes that `Interval(DayTime)`'s 
`{days, millis}` layout would lose microsecond precision, so avoiding it is 
right.
   
   One forward-looking note, not a blocker. Both mappings drop Spark's interval 
field qualifiers. `YearMonthIntervalType(YEAR, YEAR)` and 
`YearMonthIntervalType(YEAR, MONTH)` both serialize to the same 
`Interval(YearMonth)`, and on the way back `fromArrowType` produces the default 
`YearMonthIntervalType()` (YEAR TO MONTH); same story for `DayTimeIntervalType` 
start/end fields. This is harmless here because the only expressions wired are 
`make_ym_interval` and `make_dt_interval`, both of which Spark types with the 
default fields (`YearMonthIntervalType()`, `DayTimeIntervalType()`), and 
interval columns are not scanned yet. But once #4540 adds interval scans or 
field-qualified interval handling, this normalization will change a column's 
declared type. Worth a tracking note so it is not lost.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: support interval types and make_ym_interval / make_dt_interval [datafusion-comet]

Reply via email to