This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new c4da3c91d chore: improve cast documentation to add support per eval
mode (#3056)
c4da3c91d is described below
commit c4da3c91d672e6bc5d41959f6d60a671cd3dfd8a
Author: B Vadlamani <[email protected]>
AuthorDate: Tue Jan 13 11:06:02 2026 -0800
chore: improve cast documentation to add support per eval mode (#3056)
---
docs/source/user-guide/latest/compatibility.md | 194 ++++++++++-----------
.../main/scala/org/apache/comet/GenerateDocs.scala | 106 +++++++----
2 files changed, 164 insertions(+), 136 deletions(-)
diff --git a/docs/source/user-guide/latest/compatibility.md
b/docs/source/user-guide/latest/compatibility.md
index 31270404c..0ca6f8ea9 100644
--- a/docs/source/user-guide/latest/compatibility.md
+++ b/docs/source/user-guide/latest/compatibility.md
@@ -73,122 +73,118 @@ should not be used in production. The feature will be
enabled in a future releas
Cast operations in Comet fall into three levels of support:
-- **Compatible**: The results match Apache Spark
-- **Incompatible**: The results may match Apache Spark for some inputs, but
there are known issues where some inputs
+- **C (Compatible)**: The results match Apache Spark
+- **I (Incompatible)**: The results may match Apache Spark for some inputs,
but there are known issues where some inputs
will result in incorrect results or exceptions. The query stage will fall
back to Spark by default. Setting
`spark.comet.expression.Cast.allowIncompatible=true` will allow all
incompatible casts to run natively in Comet, but this is not
recommended for production use.
-- **Unsupported**: Comet does not provide a native version of this cast
expression and the query stage will fall back to
+- **U (Unsupported)**: Comet does not provide a native version of this cast
expression and the query stage will fall back to
Spark.
+- **N/A**: Spark does not support this cast.
-### Compatible Casts
+### Legacy Mode
-The following cast operations are generally compatible with Spark except for
the differences noted here.
+<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
+
+<!--BEGIN:CAST_LEGACY_TABLE-->
+<!-- prettier-ignore-start -->
+| | binary | boolean | byte | date | decimal | double | float | integer | long
| short | string | timestamp |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
+| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
+| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
+| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
+| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
+| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
+| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
+| string | C | C | C | C | I | C | C | C | C | C | - | I |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
+<!-- prettier-ignore-end -->
+
+**Notes:**
+
+- **decimal -> string**: There can be formatting differences in some case due
to Spark using scientific notation where Comet does not
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example,
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+- **string -> decimal**: Does not support fullwidth unicode digits (e.g
\\uFF10)
+ or strings containing null bytes (e.g \\u0000)
+- **string -> timestamp**: Not all valid formats are supported
+<!--END:CAST_LEGACY_TABLE-->
+
+### Try Mode
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
-<!--BEGIN:COMPAT_CAST_TABLE-->
+<!--BEGIN:CAST_TRY_TABLE-->
<!-- prettier-ignore-start -->
-| From Type | To Type | Notes |
-|-|-|-|
-| boolean | byte | |
-| boolean | short | |
-| boolean | integer | |
-| boolean | long | |
-| boolean | float | |
-| boolean | double | |
-| boolean | string | |
-| byte | boolean | |
-| byte | short | |
-| byte | integer | |
-| byte | long | |
-| byte | float | |
-| byte | double | |
-| byte | decimal | |
-| byte | string | |
-| short | boolean | |
-| short | byte | |
-| short | integer | |
-| short | long | |
-| short | float | |
-| short | double | |
-| short | decimal | |
-| short | string | |
-| integer | boolean | |
-| integer | byte | |
-| integer | short | |
-| integer | long | |
-| integer | float | |
-| integer | double | |
-| integer | decimal | |
-| integer | string | |
-| long | boolean | |
-| long | byte | |
-| long | short | |
-| long | integer | |
-| long | float | |
-| long | double | |
-| long | decimal | |
-| long | string | |
-| float | boolean | |
-| float | byte | |
-| float | short | |
-| float | integer | |
-| float | long | |
-| float | double | |
-| float | string | There can be differences in precision. For example, the
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
-| double | boolean | |
-| double | byte | |
-| double | short | |
-| double | integer | |
-| double | long | |
-| double | float | |
-| double | string | There can be differences in precision. For example, the
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
-| decimal | boolean | |
-| decimal | byte | |
-| decimal | short | |
-| decimal | integer | |
-| decimal | long | |
-| decimal | float | |
-| decimal | double | |
-| decimal | decimal | |
-| decimal | string | There can be formatting differences in some case due to
Spark using scientific notation where Comet does not |
-| string | boolean | |
-| string | byte | |
-| string | short | |
-| string | integer | |
-| string | long | |
-| string | float | |
-| string | double | |
-| string | binary | |
-| string | date | Only supports years between 262143 BC and 262142 AD |
-| binary | string | |
-| date | string | |
-| timestamp | long | |
-| timestamp | string | |
-| timestamp | date | |
+| | binary | boolean | byte | date | decimal | double | float | integer | long
| short | string | timestamp |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
+| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
+| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
+| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
+| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
+| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
+| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
+| string | C | C | C | C | I | C | C | C | C | C | - | I |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
<!-- prettier-ignore-end -->
-<!--END:COMPAT_CAST_TABLE-->
-### Incompatible Casts
+**Notes:**
-The following cast operations are not compatible with Spark for all inputs and
are disabled by default.
+- **decimal -> string**: There can be formatting differences in some case due
to Spark using scientific notation where Comet does not
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example,
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+- **string -> decimal**: Does not support fullwidth unicode digits (e.g
\\uFF10)
+ or strings containing null bytes (e.g \\u0000)
+- **string -> timestamp**: Not all valid formats are supported
+<!--END:CAST_TRY_TABLE-->
+
+### ANSI Mode
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
-<!--BEGIN:INCOMPAT_CAST_TABLE-->
+<!--BEGIN:CAST_ANSI_TABLE-->
<!-- prettier-ignore-start -->
-| From Type | To Type | Notes |
-|-|-|-|
-| float | decimal | There can be rounding differences |
-| double | decimal | There can be rounding differences |
-| string | decimal | Does not support fullwidth unicode digits (e.g \\uFF10)
-or strings containing null bytes (e.g \\u0000) |
-| string | timestamp | Not all valid formats are supported |
+| | binary | boolean | byte | date | decimal | double | float | integer | long
| short | string | timestamp |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
+| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
+| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
+| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
+| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
+| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
+| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
+| string | C | C | C | C | I | C | C | C | C | C | - | I |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
<!-- prettier-ignore-end -->
-<!--END:INCOMPAT_CAST_TABLE-->
-### Unsupported Casts
+**Notes:**
+
+- **decimal -> string**: There can be formatting differences in some case due
to Spark using scientific notation where Comet does not
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example,
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+- **string -> decimal**: Does not support fullwidth unicode digits (e.g
\\uFF10)
+ or strings containing null bytes (e.g \\u0000)
+- **string -> timestamp**: ANSI mode not supported
+<!--END:CAST_ANSI_TABLE-->
-Any cast not listed in the previous tables is currently unsupported. We are
working on adding more. See the
-[tracking issue](https://github.com/apache/datafusion-comet/issues/286) for
more details.
+See the [tracking
issue](https://github.com/apache/datafusion-comet/issues/286) for more details.
diff --git a/spark/src/main/scala/org/apache/comet/GenerateDocs.scala
b/spark/src/main/scala/org/apache/comet/GenerateDocs.scala
index 6ac01dbf7..574ff0109 100644
--- a/spark/src/main/scala/org/apache/comet/GenerateDocs.scala
+++ b/spark/src/main/scala/org/apache/comet/GenerateDocs.scala
@@ -21,13 +21,14 @@ package org.apache.comet
import java.io.{BufferedOutputStream, BufferedReader, FileOutputStream,
FileReader}
+import scala.collection.mutable
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.catalyst.expressions.Cast
import org.apache.comet.CometConf.COMET_ONHEAP_MEMORY_OVERHEAD
import org.apache.comet.expressions.{CometCast, CometEvalMode}
-import org.apache.comet.serde.{Compatible, Incompatible, QueryPlanSerde}
+import org.apache.comet.serde.{Compatible, Incompatible, QueryPlanSerde,
Unsupported}
/**
* Utility for generating markdown documentation from the configs.
@@ -109,48 +110,79 @@ object GenerateDocs {
val w = new BufferedOutputStream(new FileOutputStream(filename))
for (line <- lines) {
w.write(s"${line.stripTrailing()}\n".getBytes)
- if (line.trim == "<!--BEGIN:COMPAT_CAST_TABLE-->") {
- w.write("<!-- prettier-ignore-start -->\n".getBytes)
- w.write("| From Type | To Type | Notes |\n".getBytes)
- w.write("|-|-|-|\n".getBytes)
- for (fromType <- CometCast.supportedTypes) {
- for (toType <- CometCast.supportedTypes) {
- if (Cast.canCast(fromType, toType) && (fromType != toType ||
fromType.typeName
- .contains("decimal"))) {
- val fromTypeName = fromType.typeName.replace("(10,2)", "")
- val toTypeName = toType.typeName.replace("(10,2)", "")
- CometCast.isSupported(fromType, toType, None,
CometEvalMode.LEGACY) match {
- case Compatible(notes) =>
- val notesStr = notes.getOrElse("").trim
- w.write(s"| $fromTypeName | $toTypeName | $notesStr
|\n".getBytes)
- case _ =>
+ if (line.trim == "<!--BEGIN:CAST_LEGACY_TABLE-->") {
+ writeCastMatrixForMode(w, CometEvalMode.LEGACY)
+ } else if (line.trim == "<!--BEGIN:CAST_TRY_TABLE-->") {
+ writeCastMatrixForMode(w, CometEvalMode.TRY)
+ } else if (line.trim == "<!--BEGIN:CAST_ANSI_TABLE-->") {
+ writeCastMatrixForMode(w, CometEvalMode.ANSI)
+ }
+ }
+ w.close()
+ }
+
+ private def writeCastMatrixForMode(w: BufferedOutputStream, mode:
CometEvalMode.Value): Unit = {
+ val sortedTypes = CometCast.supportedTypes.sortBy(_.typeName)
+ val typeNames = sortedTypes.map(_.typeName.replace("(10,2)", ""))
+
+ // Collect annotations for meaningful notes
+ val annotations = mutable.ListBuffer[(String, String, String)]()
+
+ w.write("<!-- prettier-ignore-start -->\n".getBytes)
+
+ // Write header row
+ w.write("| |".getBytes)
+ for (toTypeName <- typeNames) {
+ w.write(s" $toTypeName |".getBytes)
+ }
+ w.write("\n".getBytes)
+
+ // Write separator row
+ w.write("|---|".getBytes)
+ for (_ <- typeNames) {
+ w.write("---|".getBytes)
+ }
+ w.write("\n".getBytes)
+
+ // Write data rows
+ for ((fromType, fromTypeName) <- sortedTypes.zip(typeNames)) {
+ w.write(s"| $fromTypeName |".getBytes)
+ for ((toType, toTypeName) <- sortedTypes.zip(typeNames)) {
+ val cell = if (fromType == toType) {
+ "-"
+ } else if (!Cast.canCast(fromType, toType)) {
+ "N/A"
+ } else {
+ val supportLevel = CometCast.isSupported(fromType, toType, None,
mode)
+ supportLevel match {
+ case Compatible(notes) =>
+ notes.filter(_.trim.nonEmpty).foreach { note =>
+ annotations += ((fromTypeName, toTypeName,
note.trim.replace("(10,2)", "")))
}
- }
- }
- }
- w.write("<!-- prettier-ignore-end -->\n".getBytes)
- } else if (line.trim == "<!--BEGIN:INCOMPAT_CAST_TABLE-->") {
- w.write("<!-- prettier-ignore-start -->\n".getBytes)
- w.write("| From Type | To Type | Notes |\n".getBytes)
- w.write("|-|-|-|\n".getBytes)
- for (fromType <- CometCast.supportedTypes) {
- for (toType <- CometCast.supportedTypes) {
- if (Cast.canCast(fromType, toType) && fromType != toType) {
- val fromTypeName = fromType.typeName.replace("(10,2)", "")
- val toTypeName = toType.typeName.replace("(10,2)", "")
- CometCast.isSupported(fromType, toType, None,
CometEvalMode.LEGACY) match {
- case Incompatible(notes) =>
- val notesStr = notes.getOrElse("").trim
- w.write(s"| $fromTypeName | $toTypeName | $notesStr
|\n".getBytes)
- case _ =>
+ "C"
+ case Incompatible(notes) =>
+ notes.filter(_.trim.nonEmpty).foreach { note =>
+ annotations += ((fromTypeName, toTypeName,
note.trim.replace("(10,2)", "")))
}
- }
+ "I"
+ case Unsupported(_) =>
+ "U"
}
}
- w.write("<!-- prettier-ignore-end -->\n".getBytes)
+ w.write(s" $cell |".getBytes)
+ }
+ w.write("\n".getBytes)
+ }
+
+ w.write("<!-- prettier-ignore-end -->\n".getBytes)
+
+ // Write annotations if any
+ if (annotations.nonEmpty) {
+ w.write("\n**Notes:**\n".getBytes)
+ for ((from, to, note) <- annotations.distinct) {
+ w.write(s"- **$from -> $to**: $note\n".getBytes)
}
}
- w.close()
}
/** Read file into memory */
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]