andygrove opened a new issue, #4575:
URL: https://github.com/apache/datafusion-comet/issues/4575

   ### Describe the bug
   
   `CometFromUnixTime` currently reports a single support level of 
`Incompatible(None)` and folds two unrelated limitations into one 
`getIncompatibleReasons()` string:
   
   ```scala
   override def getIncompatibleReasons(): Seq[String] = Seq(
     "Only supports the default datetime format pattern `yyyy-MM-dd HH:mm:ss`." 
+
       " DataFusion's valid timestamp range differs from Spark" +
       " (https://github.com/apache/datafusion/issues/16594)")
   
   override def getSupportLevel(expr: FromUnixTime): SupportLevel = 
Incompatible(None)
   ```
   
   (`spark/src/main/scala/org/apache/comet/serde/unixtime.scala`)
   
   These are two different things:
   
   1. **DataFusion's valid timestamp range differs from Spark.** This is a 
genuine incompatibility. It applies even to the default format pattern, where 
Comet executes natively but can produce different results outside DataFusion's 
supported range. `Incompatible` is the correct classification for this.
   2. **Only the default datetime pattern `yyyy-MM-dd HH:mm:ss` is supported.** 
A non-default pattern cannot be executed at all. `convert` falls back to Spark 
via `withFallbackReason`. This is an *unsupported* limitation, not an 
incompatibility.
   
   Reporting the unsupported format pattern as an incompatible reason is the 
misclassification noted in the third bullet of #4074. There is no 
`getUnsupportedReasons()` to surface the format limitation properly, so the 
generated compatibility docs describe a fall-back-only case as an 
incompatibility.
   
   This is a classification and documentation accuracy problem, not a 
correctness bug. Runtime behavior is already safe. By default 
(`spark.comet.expr.FromUnixTime.allowIncompatible=false`) the whole expression 
falls back to Spark, and with that flag enabled the default format runs 
natively while non-default formats fall back.
   
   ### Steps to reproduce
   
   Look at `CometFromUnixTime` in 
`spark/src/main/scala/org/apache/comet/serde/unixtime.scala` and the generated 
`from_unixtime` entry on the datetime compatibility page. The format limitation 
is rendered as an incompatibility bullet rather than as an unsupported 
limitation.
   
   ### Expected behavior
   
   `getSupportLevel` receives the concrete `FromUnixTime`, so it can 
distinguish the two cases. For example:
   
   ```scala
   override def getSupportLevel(expr: FromUnixTime): SupportLevel =
     if (expr.format != Literal(TimestampFormatter.defaultPattern)) {
       Unsupported(Some("Only the default datetime pattern `yyyy-MM-dd 
HH:mm:ss` is supported"))
     } else {
       Incompatible(None) // DataFusion's valid timestamp range differs from 
Spark
     }
   ```
   
   Then split the reason strings. Keep the DataFusion timestamp-range 
difference in `getIncompatibleReasons()`, and add a `getUnsupportedReasons()` 
for the non-default-format limitation.
   
   One thing to confirm during implementation: check how `GenerateDocs` invokes 
`getSupportLevel`. If it calls it without a concrete `FromUnixTime` instance, a 
format-dependent `getSupportLevel` needs to degrade sensibly for the docs path.
   
   ### Additional context
   
   Follow-up to the third item in #4074. The `CometSum` item from that issue is 
addressed in #4111. The `CometStringRepeat` item has been fixed separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to