andygrove commented on PR #4660:
URL:
https://github.com/apache/datafusion-comet/pull/4660#issuecomment-4800443271
Thanks @marvelshan. The instinct here is good. Separating the genuine
timestamp-range incompatibility from the format-pattern caveat is the right
idea, and pulling the format text out of `getIncompatibleReasons()` so that it
only describes what `allowIncompatible=true` actually changes (the native
`to_char` path) is a real improvement. The `from_unix_time_enabled.sql` cleanup
is nice too, since the default pattern really does run natively under
`allowIncompatible=true` and asserting that with `query` is more accurate.
A few things I would like to work through before this lands.
**1. The code keeps this `Incompatible`, not `Unsupported`.** The title and
the description say `getSupportLevel` now returns `Unsupported` for non-default
patterns, but both branches return `Incompatible`:
```scala
override def getSupportLevel(expr: FromUnixTime): SupportLevel = {
if (expr.format != Literal(TimestampFormatter.defaultPattern())) {
Incompatible(Some("Only the default datetime pattern ... is supported
natively"))
} else {
Incompatible(None)
}
}
```
I actually think keeping it `Incompatible` is correct. `CometFromUnixTime`
is a `CodegenDispatchFallback`, so non-default formats are handled by the JVM
codegen dispatcher rather than being truly unsupported. If this returned
`Unsupported`, the framework would skip the dispatcher and force a full fall
back to Spark, which would be a regression. So the code looks right, it is the
title and the "What changes are included" section that need to be updated to
match.
**2. `getUnsupportedReasons()` will read as self-contradictory in the
docs.** `GenerateDocs` calls `getUnsupportedReasons()` unconditionally and
never consults `getSupportLevel`, so this entry renders under the heading "The
following cases are not supported by Comet:", while the bullet text itself says
non-default patterns are handled by the codegen dispatcher. So the generated
page says a case is not supported and handled in the same sentence, and that
does not match runtime behavior where non-default formats run in Comet via the
dispatcher by default. `date_format` has the same
native-allow-list-plus-dispatcher shape and documents it in
`getCompatibleNotes()` (`datetime.scala:715`). Could the format caveat move
there instead? That keeps `getIncompatibleReasons()` as just the timestamp
range, which is the part of this change I like, and avoids the misleading
"Unsupported" heading.
**3. The non-default test downgrade looks inconsistent.** In
`from_unix_time.sql`, `from_unixtime(t, 'yyyy-MM-dd')` moved from `query` to
`query spark_answer_only`, but the default-pattern `from_unixtime(t)` stayed
`query`. At `allowIncompatible=false` both take the same codegen-dispatch path,
so if the default-pattern one passes as native, the non-default one should too.
Could those stay `query` so they keep asserting the dispatcher holds them in
Comet? If `query` actually fails for the non-default case, that would be worth
surfacing, since it would mean the dispatcher is not covering it the way the
new reason text claims.
**4. Minor.** `getSupportLevel` calls `TimestampFormatter.defaultPattern()`
with parens while `convert()` just below calls `defaultPattern` without. The
parens form matches the Spark definition, so it is fine, but the two call sites
reading differently in the same object is worth tidying. Also a heads up that
#4693 also edits this serde's `getSupportLevel` and `getIncompatibleReasons`
for collation, so whichever of the two merges second will need a rebase.
The `getIncompatibleReasons()` split and the enabled-test fixes are solid.
The main rework is routing the format caveat through `getCompatibleNotes()`
rather than `getUnsupportedReasons()`, and squaring the title and description
with the code.
*Disclosure: this review was assisted by an AI tool (Anthropic Claude via
Claude Code). I read the diff, the linked issues, and the related serde code
before forming these comments, and I take responsibility for them.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]