This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/main by this push:
     new c4da3c91d chore: improve cast documentation to add support per eval 
mode (#3056)
c4da3c91d is described below

commit c4da3c91d672e6bc5d41959f6d60a671cd3dfd8a
Author: B Vadlamani <[email protected]>
AuthorDate: Tue Jan 13 11:06:02 2026 -0800

    chore: improve cast documentation to add support per eval mode (#3056)
---
 docs/source/user-guide/latest/compatibility.md     | 194 ++++++++++-----------
 .../main/scala/org/apache/comet/GenerateDocs.scala | 106 +++++++----
 2 files changed, 164 insertions(+), 136 deletions(-)

diff --git a/docs/source/user-guide/latest/compatibility.md 
b/docs/source/user-guide/latest/compatibility.md
index 31270404c..0ca6f8ea9 100644
--- a/docs/source/user-guide/latest/compatibility.md
+++ b/docs/source/user-guide/latest/compatibility.md
@@ -73,122 +73,118 @@ should not be used in production. The feature will be 
enabled in a future releas
 
 Cast operations in Comet fall into three levels of support:
 
-- **Compatible**: The results match Apache Spark
-- **Incompatible**: The results may match Apache Spark for some inputs, but 
there are known issues where some inputs
+- **C (Compatible)**: The results match Apache Spark
+- **I (Incompatible)**: The results may match Apache Spark for some inputs, 
but there are known issues where some inputs
   will result in incorrect results or exceptions. The query stage will fall 
back to Spark by default. Setting
   `spark.comet.expression.Cast.allowIncompatible=true` will allow all 
incompatible casts to run natively in Comet, but this is not
   recommended for production use.
-- **Unsupported**: Comet does not provide a native version of this cast 
expression and the query stage will fall back to
+- **U (Unsupported)**: Comet does not provide a native version of this cast 
expression and the query stage will fall back to
   Spark.
+- **N/A**: Spark does not support this cast.
 
-### Compatible Casts
+### Legacy Mode
 
-The following cast operations are generally compatible with Spark except for 
the differences noted here.
+<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
+
+<!--BEGIN:CAST_LEGACY_TABLE-->
+<!-- prettier-ignore-start -->
+| | binary | boolean | byte | date | decimal | double | float | integer | long 
| short | string | timestamp |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
+| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
+| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
+| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
+| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
+| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
+| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
+| string | C | C | C | C | I | C | C | C | C | C | - | I |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
+<!-- prettier-ignore-end -->
+
+**Notes:**
+
+- **decimal -> string**: There can be formatting differences in some case due 
to Spark using scientific notation where Comet does not
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the 
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+- **string -> decimal**: Does not support fullwidth unicode digits (e.g 
\\uFF10)
+  or strings containing null bytes (e.g \\u0000)
+- **string -> timestamp**: Not all valid formats are supported
+<!--END:CAST_LEGACY_TABLE-->
+
+### Try Mode
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 
-<!--BEGIN:COMPAT_CAST_TABLE-->
+<!--BEGIN:CAST_TRY_TABLE-->
 <!-- prettier-ignore-start -->
-| From Type | To Type | Notes |
-|-|-|-|
-| boolean | byte |  |
-| boolean | short |  |
-| boolean | integer |  |
-| boolean | long |  |
-| boolean | float |  |
-| boolean | double |  |
-| boolean | string |  |
-| byte | boolean |  |
-| byte | short |  |
-| byte | integer |  |
-| byte | long |  |
-| byte | float |  |
-| byte | double |  |
-| byte | decimal |  |
-| byte | string |  |
-| short | boolean |  |
-| short | byte |  |
-| short | integer |  |
-| short | long |  |
-| short | float |  |
-| short | double |  |
-| short | decimal |  |
-| short | string |  |
-| integer | boolean |  |
-| integer | byte |  |
-| integer | short |  |
-| integer | long |  |
-| integer | float |  |
-| integer | double |  |
-| integer | decimal |  |
-| integer | string |  |
-| long | boolean |  |
-| long | byte |  |
-| long | short |  |
-| long | integer |  |
-| long | float |  |
-| long | double |  |
-| long | decimal |  |
-| long | string |  |
-| float | boolean |  |
-| float | byte |  |
-| float | short |  |
-| float | integer |  |
-| float | long |  |
-| float | double |  |
-| float | string | There can be differences in precision. For example, the 
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
-| double | boolean |  |
-| double | byte |  |
-| double | short |  |
-| double | integer |  |
-| double | long |  |
-| double | float |  |
-| double | string | There can be differences in precision. For example, the 
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
-| decimal | boolean |  |
-| decimal | byte |  |
-| decimal | short |  |
-| decimal | integer |  |
-| decimal | long |  |
-| decimal | float |  |
-| decimal | double |  |
-| decimal | decimal |  |
-| decimal | string | There can be formatting differences in some case due to 
Spark using scientific notation where Comet does not |
-| string | boolean |  |
-| string | byte |  |
-| string | short |  |
-| string | integer |  |
-| string | long |  |
-| string | float |  |
-| string | double |  |
-| string | binary |  |
-| string | date | Only supports years between 262143 BC and 262142 AD |
-| binary | string |  |
-| date | string |  |
-| timestamp | long |  |
-| timestamp | string |  |
-| timestamp | date |  |
+| | binary | boolean | byte | date | decimal | double | float | integer | long 
| short | string | timestamp |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
+| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
+| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
+| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
+| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
+| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
+| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
+| string | C | C | C | C | I | C | C | C | C | C | - | I |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
 <!-- prettier-ignore-end -->
-<!--END:COMPAT_CAST_TABLE-->
 
-### Incompatible Casts
+**Notes:**
 
-The following cast operations are not compatible with Spark for all inputs and 
are disabled by default.
+- **decimal -> string**: There can be formatting differences in some case due 
to Spark using scientific notation where Comet does not
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the 
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+- **string -> decimal**: Does not support fullwidth unicode digits (e.g 
\\uFF10)
+  or strings containing null bytes (e.g \\u0000)
+- **string -> timestamp**: Not all valid formats are supported
+<!--END:CAST_TRY_TABLE-->
+
+### ANSI Mode
 
 <!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
 
-<!--BEGIN:INCOMPAT_CAST_TABLE-->
+<!--BEGIN:CAST_ANSI_TABLE-->
 <!-- prettier-ignore-start -->
-| From Type | To Type | Notes |
-|-|-|-|
-| float | decimal  | There can be rounding differences |
-| double | decimal  | There can be rounding differences |
-| string | decimal  | Does not support fullwidth unicode digits (e.g \\uFF10)
-or strings containing null bytes (e.g \\u0000) |
-| string | timestamp  | Not all valid formats are supported |
+| | binary | boolean | byte | date | decimal | double | float | integer | long 
| short | string | timestamp |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
+| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
+| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
+| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
+| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
+| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
+| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
+| string | C | C | C | C | I | C | C | C | C | C | - | I |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
 <!-- prettier-ignore-end -->
-<!--END:INCOMPAT_CAST_TABLE-->
 
-### Unsupported Casts
+**Notes:**
+
+- **decimal -> string**: There can be formatting differences in some case due 
to Spark using scientific notation where Comet does not
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example, 
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the 
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+- **string -> decimal**: Does not support fullwidth unicode digits (e.g 
\\uFF10)
+  or strings containing null bytes (e.g \\u0000)
+- **string -> timestamp**: ANSI mode not supported
+<!--END:CAST_ANSI_TABLE-->
 
-Any cast not listed in the previous tables is currently unsupported. We are 
working on adding more. See the
-[tracking issue](https://github.com/apache/datafusion-comet/issues/286) for 
more details.
+See the [tracking 
issue](https://github.com/apache/datafusion-comet/issues/286) for more details.
diff --git a/spark/src/main/scala/org/apache/comet/GenerateDocs.scala 
b/spark/src/main/scala/org/apache/comet/GenerateDocs.scala
index 6ac01dbf7..574ff0109 100644
--- a/spark/src/main/scala/org/apache/comet/GenerateDocs.scala
+++ b/spark/src/main/scala/org/apache/comet/GenerateDocs.scala
@@ -21,13 +21,14 @@ package org.apache.comet
 
 import java.io.{BufferedOutputStream, BufferedReader, FileOutputStream, 
FileReader}
 
+import scala.collection.mutable
 import scala.collection.mutable.ListBuffer
 
 import org.apache.spark.sql.catalyst.expressions.Cast
 
 import org.apache.comet.CometConf.COMET_ONHEAP_MEMORY_OVERHEAD
 import org.apache.comet.expressions.{CometCast, CometEvalMode}
-import org.apache.comet.serde.{Compatible, Incompatible, QueryPlanSerde}
+import org.apache.comet.serde.{Compatible, Incompatible, QueryPlanSerde, 
Unsupported}
 
 /**
  * Utility for generating markdown documentation from the configs.
@@ -109,48 +110,79 @@ object GenerateDocs {
     val w = new BufferedOutputStream(new FileOutputStream(filename))
     for (line <- lines) {
       w.write(s"${line.stripTrailing()}\n".getBytes)
-      if (line.trim == "<!--BEGIN:COMPAT_CAST_TABLE-->") {
-        w.write("<!-- prettier-ignore-start -->\n".getBytes)
-        w.write("| From Type | To Type | Notes |\n".getBytes)
-        w.write("|-|-|-|\n".getBytes)
-        for (fromType <- CometCast.supportedTypes) {
-          for (toType <- CometCast.supportedTypes) {
-            if (Cast.canCast(fromType, toType) && (fromType != toType || 
fromType.typeName
-                .contains("decimal"))) {
-              val fromTypeName = fromType.typeName.replace("(10,2)", "")
-              val toTypeName = toType.typeName.replace("(10,2)", "")
-              CometCast.isSupported(fromType, toType, None, 
CometEvalMode.LEGACY) match {
-                case Compatible(notes) =>
-                  val notesStr = notes.getOrElse("").trim
-                  w.write(s"| $fromTypeName | $toTypeName | $notesStr 
|\n".getBytes)
-                case _ =>
+      if (line.trim == "<!--BEGIN:CAST_LEGACY_TABLE-->") {
+        writeCastMatrixForMode(w, CometEvalMode.LEGACY)
+      } else if (line.trim == "<!--BEGIN:CAST_TRY_TABLE-->") {
+        writeCastMatrixForMode(w, CometEvalMode.TRY)
+      } else if (line.trim == "<!--BEGIN:CAST_ANSI_TABLE-->") {
+        writeCastMatrixForMode(w, CometEvalMode.ANSI)
+      }
+    }
+    w.close()
+  }
+
+  private def writeCastMatrixForMode(w: BufferedOutputStream, mode: 
CometEvalMode.Value): Unit = {
+    val sortedTypes = CometCast.supportedTypes.sortBy(_.typeName)
+    val typeNames = sortedTypes.map(_.typeName.replace("(10,2)", ""))
+
+    // Collect annotations for meaningful notes
+    val annotations = mutable.ListBuffer[(String, String, String)]()
+
+    w.write("<!-- prettier-ignore-start -->\n".getBytes)
+
+    // Write header row
+    w.write("| |".getBytes)
+    for (toTypeName <- typeNames) {
+      w.write(s" $toTypeName |".getBytes)
+    }
+    w.write("\n".getBytes)
+
+    // Write separator row
+    w.write("|---|".getBytes)
+    for (_ <- typeNames) {
+      w.write("---|".getBytes)
+    }
+    w.write("\n".getBytes)
+
+    // Write data rows
+    for ((fromType, fromTypeName) <- sortedTypes.zip(typeNames)) {
+      w.write(s"| $fromTypeName |".getBytes)
+      for ((toType, toTypeName) <- sortedTypes.zip(typeNames)) {
+        val cell = if (fromType == toType) {
+          "-"
+        } else if (!Cast.canCast(fromType, toType)) {
+          "N/A"
+        } else {
+          val supportLevel = CometCast.isSupported(fromType, toType, None, 
mode)
+          supportLevel match {
+            case Compatible(notes) =>
+              notes.filter(_.trim.nonEmpty).foreach { note =>
+                annotations += ((fromTypeName, toTypeName, 
note.trim.replace("(10,2)", "")))
               }
-            }
-          }
-        }
-        w.write("<!-- prettier-ignore-end -->\n".getBytes)
-      } else if (line.trim == "<!--BEGIN:INCOMPAT_CAST_TABLE-->") {
-        w.write("<!-- prettier-ignore-start -->\n".getBytes)
-        w.write("| From Type | To Type | Notes |\n".getBytes)
-        w.write("|-|-|-|\n".getBytes)
-        for (fromType <- CometCast.supportedTypes) {
-          for (toType <- CometCast.supportedTypes) {
-            if (Cast.canCast(fromType, toType) && fromType != toType) {
-              val fromTypeName = fromType.typeName.replace("(10,2)", "")
-              val toTypeName = toType.typeName.replace("(10,2)", "")
-              CometCast.isSupported(fromType, toType, None, 
CometEvalMode.LEGACY) match {
-                case Incompatible(notes) =>
-                  val notesStr = notes.getOrElse("").trim
-                  w.write(s"| $fromTypeName | $toTypeName  | $notesStr 
|\n".getBytes)
-                case _ =>
+              "C"
+            case Incompatible(notes) =>
+              notes.filter(_.trim.nonEmpty).foreach { note =>
+                annotations += ((fromTypeName, toTypeName, 
note.trim.replace("(10,2)", "")))
               }
-            }
+              "I"
+            case Unsupported(_) =>
+              "U"
           }
         }
-        w.write("<!-- prettier-ignore-end -->\n".getBytes)
+        w.write(s" $cell |".getBytes)
+      }
+      w.write("\n".getBytes)
+    }
+
+    w.write("<!-- prettier-ignore-end -->\n".getBytes)
+
+    // Write annotations if any
+    if (annotations.nonEmpty) {
+      w.write("\n**Notes:**\n".getBytes)
+      for ((from, to, note) <- annotations.distinct) {
+        w.write(s"- **$from -> $to**: $note\n".getBytes)
       }
     }
-    w.close()
   }
 
   /** Read file into memory */


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to