This is an automated email from the ASF dual-hosted git repository.
andygrove pushed a commit to branch branch-0.16
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/branch-0.16 by this push:
new 4b72c6899 generate docs
4b72c6899 is described below
commit 4b72c6899ba512a184beb69f3a575e3e3b9204d8
Author: Andy Grove <[email protected]>
AuthorDate: Thu May 7 18:36:57 2026 -0600
generate docs
---
.../latest/compatibility/expressions/aggregate.md | 30 ++
.../latest/compatibility/expressions/array.md | 56 ++++
.../latest/compatibility/expressions/cast.md | 72 +++++
.../latest/compatibility/expressions/datetime.md | 78 +++++
.../latest/compatibility/expressions/map.md | 7 +
.../latest/compatibility/expressions/math.md | 6 +
.../latest/compatibility/expressions/misc.md | 30 ++
.../latest/compatibility/expressions/string.md | 95 ++++++
.../latest/compatibility/expressions/struct.md | 26 ++
docs/source/user-guide/latest/configs.md | 328 +++++++++++++++++++++
10 files changed, 728 insertions(+)
diff --git
a/docs/source/user-guide/latest/compatibility/expressions/aggregate.md
b/docs/source/user-guide/latest/compatibility/expressions/aggregate.md
index 8d15eea43..317decbd7 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/aggregate.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/aggregate.md
@@ -20,4 +20,34 @@ under the License.
# Aggregate Expressions
<!--BEGIN:EXPR_COMPAT[aggregate]-->
+
+## Average
+
+The following incompatibilities cause `Average` to fall back to Spark by
default. Set `spark.comet.expression.Average.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- Falls back to Spark in ANSI mode. Supports all numeric inputs except decimal
types.
+
+## CollectSet
+
+The following incompatibilities cause `CollectSet` to fall back to Spark by
default. Set `spark.comet.expression.CollectSet.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Comet deduplicates NaN values (treats `NaN == NaN`) while Spark treats each
NaN as a distinct value. When `spark.comet.exec.strictFloatingPoint=true`,
`collect_set` on floating-point types falls back to Spark unless
`spark.comet.expression.CollectSet.allowIncompatible=true` is set.
+
+## First
+
+The following differences from Spark are always present and do not require any
additional configuration:
+
+- This function is not deterministic. Results may not match Spark.
+
+## Last
+
+The following differences from Spark are always present and do not require any
additional configuration:
+
+- This function is not deterministic. Results may not match Spark.
+
+## Sum
+
+The following incompatibilities cause `Sum` to fall back to Spark by default.
Set `spark.comet.expression.Sum.allowIncompatible=true` to enable Comet
acceleration despite these differences.
+
+- Falls back to Spark in ANSI mode.
<!--END:EXPR_COMPAT-->
diff --git a/docs/source/user-guide/latest/compatibility/expressions/array.md
b/docs/source/user-guide/latest/compatibility/expressions/array.md
index c7f2569b4..d5f3eff81 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/array.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/array.md
@@ -20,4 +20,60 @@ under the License.
# Array Expressions
<!--BEGIN:EXPR_COMPAT[array]-->
+
+## ArrayExcept
+
+The following incompatibilities cause `ArrayExcept` to fall back to Spark by
default. Set `spark.comet.expression.ArrayExcept.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Null handling and ordering may differ from Spark
+
+## ArrayFilter
+
+The following cases are not supported by Comet:
+
+- Only supports `array_filter` when the function is `IsNotNull` (used by
`array_compact`)
+
+## ArrayIntersect
+
+The following incompatibilities cause `ArrayIntersect` to fall back to Spark
by default. Set `spark.comet.expression.ArrayIntersect.allowIncompatible=true`
to enable Comet acceleration despite these differences.
+
+- Result array element order may differ from Spark when the right array is
longer than the left (DataFusion probes the longer side).
+
+The following cases are not supported by Comet:
+
+- array_intersect on collated strings is not supported.
+
+## ArrayJoin
+
+The following incompatibilities cause `ArrayJoin` to fall back to Spark by
default. Set `spark.comet.expression.ArrayJoin.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Null handling may differ from Spark
+
+## ArraysZip
+
+The following cases are not supported by Comet:
+
+- Not all input data types are supported; falls back to Spark for unsupported
types
+
+## ElementAt
+
+The following cases are not supported by Comet:
+
+- Input must be an array. `Map` inputs are not supported.
+
+## Size
+
+The following cases are not supported by Comet:
+
+- Only supports `ArrayType` input; `MapType` input is not supported
+
+## SortArray
+
+The following incompatibilities cause `SortArray` to fall back to Spark by
default. Set `spark.comet.expression.SortArray.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- When `spark.comet.exec.strictFloatingPoint=true`, sorting on floating-point
types is not 100% compatible with Spark
+
+The following cases are not supported by Comet:
+
+- Nested arrays with `Struct` or `Null` child values are not supported
natively and will fall back to Spark.
<!--END:EXPR_COMPAT-->
diff --git a/docs/source/user-guide/latest/compatibility/expressions/cast.md
b/docs/source/user-guide/latest/compatibility/expressions/cast.md
index f7182e571..720f485c7 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/cast.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/cast.md
@@ -149,16 +149,88 @@ as `"1.23E+4"`).
## Legacy Mode
<!--BEGIN:CAST_LEGACY_TABLE-->
+<!-- prettier-ignore-start -->
+| | binary | boolean | byte | date | decimal | double | float | integer | long
| short | string | timestamp | timestamp_ntz |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
N/A |
+| boolean | N/A | - | C | N/A | C | C | C | C | C | C | C | C | N/A |
+| byte | C | C | - | N/A | C | C | C | C | C | C | C | C | N/A |
+| date | N/A | C | C | - | C | C | C | C | C | C | C | C | C |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | C | N/A |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | C | N/A |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | C | N/A |
+| integer | C | C | C | N/A | C | C | C | - | C | C | C | C | N/A |
+| long | C | C | C | N/A | C | C | C | C | - | C | C | C | N/A |
+| short | C | C | C | N/A | C | C | C | C | C | - | C | C | N/A |
+| string | C | C | C | C | C | C | C | C | C | C | - | C | C |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - | C |
+| timestamp_ntz | N/A | N/A | N/A | C | N/A | N/A | N/A | N/A | N/A | N/A | C
| C | - |
+
+**Notes:**
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example,
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+<!-- prettier-ignore-end -->
<!--END:CAST_LEGACY_TABLE-->
## Try Mode
<!--BEGIN:CAST_TRY_TABLE-->
+<!-- prettier-ignore-start -->
+| | binary | boolean | byte | date | decimal | double | float | integer | long
| short | string | timestamp | timestamp_ntz |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
N/A |
+| boolean | N/A | - | C | N/A | C | C | C | C | C | C | C | U | N/A |
+| byte | U | C | - | N/A | C | C | C | C | C | C | C | C | N/A |
+| date | N/A | U | U | - | U | U | U | U | U | U | C | C | C |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | C | N/A |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | C | N/A |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | C | N/A |
+| integer | U | C | C | N/A | C | C | C | - | C | C | C | C | N/A |
+| long | U | C | C | N/A | C | C | C | C | - | C | C | C | N/A |
+| short | U | C | C | N/A | C | C | C | C | C | - | C | C | N/A |
+| string | C | C | C | C | C | C | C | C | C | C | - | C | C |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - | C |
+| timestamp_ntz | N/A | N/A | N/A | C | N/A | N/A | N/A | N/A | N/A | N/A | C
| C | - |
+
+**Notes:**
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example,
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+<!-- prettier-ignore-end -->
<!--END:CAST_TRY_TABLE-->
## ANSI Mode
<!--BEGIN:CAST_ANSI_TABLE-->
+<!-- prettier-ignore-start -->
+| | binary | boolean | byte | date | decimal | double | float | integer | long
| short | string | timestamp | timestamp_ntz |
+|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
N/A |
+| boolean | N/A | - | C | N/A | C | C | C | C | C | C | C | U | N/A |
+| byte | U | C | - | N/A | C | C | C | C | C | C | C | C | N/A |
+| date | N/A | U | U | - | U | U | U | U | U | U | C | C | C |
+| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | C | N/A |
+| double | N/A | C | C | N/A | I | - | C | C | C | C | C | C | N/A |
+| float | N/A | C | C | N/A | I | C | - | C | C | C | C | C | N/A |
+| integer | U | C | C | N/A | C | C | C | - | C | C | C | C | N/A |
+| long | U | C | C | N/A | C | C | C | C | - | C | C | C | N/A |
+| short | U | C | C | N/A | C | C | C | C | C | - | C | C | N/A |
+| string | C | C | C | C | C | C | C | C | C | C | - | C | C |
+| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - | C |
+| timestamp_ntz | N/A | N/A | N/A | C | N/A | N/A | N/A | N/A | N/A | N/A | C
| C | - |
+
+**Notes:**
+- **double -> decimal**: There can be rounding differences
+- **double -> string**: There can be differences in precision. For example,
the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **float -> decimal**: There can be rounding differences
+- **float -> string**: There can be differences in precision. For example, the
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
+- **string -> date**: Only supports years between 262143 BC and 262142 AD
+<!-- prettier-ignore-end -->
<!--END:CAST_ANSI_TABLE-->
See the [tracking
issue](https://github.com/apache/datafusion-comet/issues/286) for more details.
diff --git
a/docs/source/user-guide/latest/compatibility/expressions/datetime.md
b/docs/source/user-guide/latest/compatibility/expressions/datetime.md
index afd934dc0..612bc6ead 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/datetime.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/datetime.md
@@ -44,4 +44,82 @@ If you need to process dates far in the future with accurate
timezone handling,
<!--BEGIN:EXPR_COMPAT[datetime]-->
+## DateFormatClass
+
+The following incompatibilities cause `DateFormatClass` to fall back to Spark
by default. Set `spark.comet.expression.DateFormatClass.allowIncompatible=true`
to enable Comet acceleration despite these differences.
+
+- Non-UTC timezones may produce different results than Spark
+
+The following cases are not supported by Comet:
+
+- Only the following formats are supported:
+ - `EEE`
+ - `EEEE`
+ - `HH`
+ - `HH:mm`
+ - `HH:mm:ss`
+ - `MM`
+ - `MMM`
+ - `MMMM`
+ - `dd`
+ - `h:mm a`
+ - `hh:mm a`
+ - `hh:mm:ss a`
+ - `mm`
+ - `ss`
+ - `yy`
+ - `yyyy`
+ - `yyyy-MM-dd`
+ - `yyyy-MM-dd HH:mm:ss`
+ - `yyyy-MM-dd'T'HH:mm:ss`
+ - `yyyy/MM/dd`
+ - `yyyy/MM/dd HH:mm:ss`
+ - `yyyyMM`
+ - `yyyyMMdd`
+
+## FromUnixTime
+
+The following incompatibilities cause `FromUnixTime` to fall back to Spark by
default. Set `spark.comet.expression.FromUnixTime.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Only supports the default datetime format pattern `yyyy-MM-dd HH:mm:ss`.
DataFusion's valid timestamp range differs from Spark
(https://github.com/apache/datafusion/issues/16594)
+
+## Hour
+
+The following incompatibilities cause `Hour` to fall back to Spark by default.
Set `spark.comet.expression.Hour.allowIncompatible=true` to enable Comet
acceleration despite these differences.
+
+- Incorrectly applies timezone conversion to TimestampNTZ inputs
(https://github.com/apache/datafusion-comet/issues/3180)
+
+## Minute
+
+The following incompatibilities cause `Minute` to fall back to Spark by
default. Set `spark.comet.expression.Minute.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- Incorrectly applies timezone conversion to TimestampNTZ inputs
(https://github.com/apache/datafusion-comet/issues/3180)
+
+## Second
+
+The following incompatibilities cause `Second` to fall back to Spark by
default. Set `spark.comet.expression.Second.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- Incorrectly applies timezone conversion to TimestampNTZ inputs
(https://github.com/apache/datafusion-comet/issues/3180)
+
+## TruncDate
+
+The following incompatibilities cause `TruncDate` to fall back to Spark by
default. Set `spark.comet.expression.TruncDate.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Non-literal format strings will throw an exception instead of returning NULL
+
+The following cases are not supported by Comet:
+
+- Only the following formats are supported: year, yyyy, yy, quarter, mon,
month, mm, week
+
+## TruncTimestamp
+
+The following incompatibilities cause `TruncTimestamp` to fall back to Spark
by default. Set `spark.comet.expression.TruncTimestamp.allowIncompatible=true`
to enable Comet acceleration despite these differences.
+
+- Produces incorrect results when used with non-UTC timezones. Compatible when
timezone is UTC. (https://github.com/apache/datafusion-comet/issues/2649)
+
+## UnixTimestamp
+
+The following cases are not supported by Comet:
+
+- Only `TimestampType` and `DateType` inputs are supported. `TimestampNTZType`
is not supported because Comet incorrectly applies timezone conversion to
TimestampNTZ values.
<!--END:EXPR_COMPAT-->
diff --git a/docs/source/user-guide/latest/compatibility/expressions/map.md
b/docs/source/user-guide/latest/compatibility/expressions/map.md
index 9368ae68d..ec9fd6391 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/map.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/map.md
@@ -31,4 +31,11 @@ IEEE total ordering for floating-point, which differs from
Spark's `Double.compa
`NaN` and `-0.0`.
<!--BEGIN:EXPR_COMPAT[map]-->
+
+## MapFromEntries
+
+The following incompatibilities cause `MapFromEntries` to fall back to Spark
by default. Set `spark.comet.expression.MapFromEntries.allowIncompatible=true`
to enable Comet acceleration despite these differences.
+
+- Using BinaryType as Map keys is not allowed in map_from_entries
+- Using BinaryType as Map values is not allowed in map_from_entries
<!--END:EXPR_COMPAT-->
diff --git a/docs/source/user-guide/latest/compatibility/expressions/math.md
b/docs/source/user-guide/latest/compatibility/expressions/math.md
index 6d8905adf..f347f697e 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/math.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/math.md
@@ -20,4 +20,10 @@ under the License.
# Math Expressions
<!--BEGIN:EXPR_COMPAT[math]-->
+
+## Abs
+
+The following cases are not supported by Comet:
+
+- Only integral, floating-point, and decimal types are supported
<!--END:EXPR_COMPAT-->
diff --git a/docs/source/user-guide/latest/compatibility/expressions/misc.md
b/docs/source/user-guide/latest/compatibility/expressions/misc.md
index 58d9a0e12..185282dcd 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/misc.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/misc.md
@@ -20,4 +20,34 @@ under the License.
# Miscellaneous Expressions
<!--BEGIN:EXPR_COMPAT[misc]-->
+
+## CheckOverflow
+
+The following cases are not supported by Comet:
+
+- Only `DecimalType` is supported
+
+## KnownFloatingPointNormalized
+
+The following cases are not supported by Comet:
+
+- Only supports `NormalizeNaNAndZero` child expressions
+
+## Literal
+
+The following cases are not supported by Comet:
+
+- Not all data types are supported for literal values
+
+## MakeDecimal
+
+The following cases are not supported by Comet:
+
+- Only `LongType` input is supported
+
+## SortOrder
+
+The following incompatibilities cause `SortOrder` to fall back to Spark by
default. Set `spark.comet.expression.SortOrder.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- When `spark.comet.exec.strictFloatingPoint=true`, sorting on floating-point
types is not 100% compatible with Spark
<!--END:EXPR_COMPAT-->
diff --git a/docs/source/user-guide/latest/compatibility/expressions/string.md
b/docs/source/user-guide/latest/compatibility/expressions/string.md
index d86f299c0..dc7466776 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/string.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/string.md
@@ -20,4 +20,99 @@ under the License.
# String Expressions
<!--BEGIN:EXPR_COMPAT[string]-->
+
+## Concat
+
+The following incompatibilities cause `Concat` to fall back to Spark by
default. Set `spark.comet.expression.Concat.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- CONCAT supports only string input parameters
+
+## GetJsonObject
+
+The following incompatibilities cause `GetJsonObject` to fall back to Spark by
default. Set `spark.comet.expression.GetJsonObject.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Spark allows single-quoted JSON and unescaped control characters which Comet
does not support
+
+## InitCap
+
+The following incompatibilities cause `InitCap` to fall back to Spark by
default. Set `spark.comet.expression.InitCap.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- Treats hyphen as a word separator (e.g. `robert rose-smith` produces `Robert
Rose-Smith` instead of Spark's `Robert Rose-smith`)
(https://github.com/apache/datafusion-comet/issues/1052)
+
+## Left
+
+The following cases are not supported by Comet:
+
+- Only supports `BinaryType` and `StringType` input
+- The length argument must be a literal value
+
+## Length
+
+The following cases are not supported by Comet:
+
+- `BinaryType` input is not supported
+
+## Lower
+
+The following incompatibilities cause `Lower` to fall back to Spark by
default. Set `spark.comet.expression.Lower.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- Results can vary depending on locale and character set. Requires
`spark.comet.caseConversion.enabled=true` to enable.
+
+## RLike
+
+The following incompatibilities cause `RLike` to fall back to Spark by
default. Set `spark.comet.expression.RLike.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- Uses Rust regexp engine, which has different behavior to Java regexp engine
+
+## RegExpReplace
+
+The following incompatibilities cause `RegExpReplace` to fall back to Spark by
default. Set `spark.comet.expression.RegExpReplace.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Regexp pattern may not be compatible with Spark
+
+The following cases are not supported by Comet:
+
+- Only supports `regexp_replace` with an offset of 1 (no offset)
+
+## Reverse
+
+The following incompatibilities cause `Reverse` to fall back to Spark by
default. Set `spark.comet.expression.Reverse.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- reverse on array containing binary is not supported
+
+## Right
+
+The following cases are not supported by Comet:
+
+- Only supports `StringType` input
+
+## StringLPad
+
+The following cases are not supported by Comet:
+
+- Scalar values are not supported for the `str` argument. Only scalar values
are supported for the `pad` argument.
+
+## StringRPad
+
+The following cases are not supported by Comet:
+
+- Scalar values are not supported for the `str` argument. Only scalar values
are supported for the `pad` argument.
+
+## StringRepeat
+
+The following differences from Spark are always present and do not require any
additional configuration:
+
+- A negative argument for the number of times to repeat throws an exception
instead of returning an empty string as Spark does
+
+## StringSplit
+
+The following incompatibilities cause `StringSplit` to fall back to Spark by
default. Set `spark.comet.expression.StringSplit.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Regex engine differences between Java and Rust
+
+## Upper
+
+The following incompatibilities cause `Upper` to fall back to Spark by
default. Set `spark.comet.expression.Upper.allowIncompatible=true` to enable
Comet acceleration despite these differences.
+
+- Results can vary depending on locale and character set. Requires
`spark.comet.caseConversion.enabled=true` to enable.
<!--END:EXPR_COMPAT-->
diff --git a/docs/source/user-guide/latest/compatibility/expressions/struct.md
b/docs/source/user-guide/latest/compatibility/expressions/struct.md
index 1eaaf4a5e..3c082ef8b 100644
--- a/docs/source/user-guide/latest/compatibility/expressions/struct.md
+++ b/docs/source/user-guide/latest/compatibility/expressions/struct.md
@@ -20,4 +20,30 @@ under the License.
# Struct Expressions
<!--BEGIN:EXPR_COMPAT[struct]-->
+
+## JsonToStructs
+
+The following incompatibilities cause `JsonToStructs` to fall back to Spark by
default. Set `spark.comet.expression.JsonToStructs.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Partially implemented and not comprehensively tested
+
+The following cases are not supported by Comet:
+
+- Requires an explicit schema
+
+## StructsToCsv
+
+The following incompatibilities cause `StructsToCsv` to fall back to Spark by
default. Set `spark.comet.expression.StructsToCsv.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Date, Timestamp, TimestampNTZ, and Binary data types may produce different
results (https://github.com/apache/datafusion-comet/issues/3232)
+
+The following cases are not supported by Comet:
+
+- Complex types (arrays, maps, structs) in the schema are not supported
+
+## StructsToJson
+
+The following incompatibilities cause `StructsToJson` to fall back to Spark by
default. Set `spark.comet.expression.StructsToJson.allowIncompatible=true` to
enable Comet acceleration despite these differences.
+
+- Does not support `+Infinity` and `-Infinity` for numeric types (float,
double). (https://github.com/apache/datafusion-comet/issues/3016)
<!--END:EXPR_COMPAT-->
diff --git a/docs/source/user-guide/latest/configs.md
b/docs/source/user-guide/latest/configs.md
index a268691a3..bb49c351c 100644
--- a/docs/source/user-guide/latest/configs.md
+++ b/docs/source/user-guide/latest/configs.md
@@ -24,16 +24,57 @@ Comet provides the following configuration settings.
## Scan Configuration Settings
<!--BEGIN:CONFIG_TABLE[scan]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.scan.icebergNative.dataFileConcurrencyLimit` | The number of
Iceberg data files to read concurrently within a single task. Higher values
improve throughput for tables with many small files by overlapping I/O latency,
but increase memory usage. Values between 2 and 8 are suggested. | 1 |
+| `spark.comet.scan.icebergNative.enabled` | Whether to enable native Iceberg
table scan using iceberg-rust. When enabled, Iceberg tables are read directly
through native execution, bypassing Spark's DataSource V2 API for better
performance. | true |
+| `spark.comet.scan.unsignedSmallIntSafetyCheck` | Parquet files may contain
unsigned 8-bit integers (UINT_8) which Spark maps to ShortType. When this
config is true (default), Comet falls back to Spark for ShortType columns
because we cannot distinguish signed INT16 (safe) from unsigned UINT_8 (may
produce different results). Set to false to allow native execution of ShortType
columns if you know your data does not contain unsigned UINT_8 columns from
improperly encoded Parquet files. F [...]
+| `spark.hadoop.fs.comet.libhdfs.schemes` | Defines filesystem schemes (e.g.,
hdfs, webhdfs) that the native side accesses via libhdfs, separated by commas.
Valid only when built with hdfs feature enabled. | |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Parquet Reader Configuration Settings
<!--BEGIN:CONFIG_TABLE[parquet]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.parquet.read.io.adjust.readRange.skew` | In the parallel
reader, if the read ranges submitted are skewed in sizes, this option will
cause the reader to break up larger read ranges into smaller ranges to reduce
the skew. This will result in a slightly larger number of connections opened to
the file system but may give improved performance. | false |
+| `spark.comet.parquet.read.io.mergeRanges` | When enabled the parallel reader
will try to merge ranges of data that are separated by less than
`comet.parquet.read.io.mergeRanges.delta` bytes. Longer continuous reads are
faster on cloud storage. | true |
+| `spark.comet.parquet.read.io.mergeRanges.delta` | The delta in bytes between
consecutive read ranges below which the parallel reader will try to merge the
ranges. The default is 8MB. | 8388608 |
+| `spark.comet.parquet.read.parallel.io.enabled` | Whether to enable Comet's
parallel reader for Parquet files. The parallel reader reads ranges of
consecutive data in a file in parallel. It is faster for large files and row
groups but uses more resources. | true |
+| `spark.comet.parquet.read.parallel.io.thread-pool.size` | The maximum number
of parallel threads the parallel reader will use in a single executor. For
executors configured with a smaller number of cores, use a smaller number. | 16
|
+| `spark.comet.parquet.respectFilterPushdown` | Whether to respect Spark's
PARQUET_FILTER_PUSHDOWN_ENABLED config. This needs to be respected when running
the Spark SQL test suite but the default setting results in poor performance in
Comet when using the new native scans, disabled by default | false |
+| `spark.comet.scan.impl` | The implementation of Comet's Parquet scan to use.
Available scans are `native_datafusion`, and `native_iceberg_compat`.
`native_datafusion` is a fully native implementation, and
`native_iceberg_compat` is a hybrid implementation that supports some
additional features, such as row indexes and field ids. `auto` (default)
chooses the best available scan based on the scan schema. It can be overridden
by the environment variable `COMET_PARQUET_SCAN_IMPL`. | auto |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Query Execution Settings
<!--BEGIN:CONFIG_TABLE[exec]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.caseConversion.enabled` | Java uses locale-specific rules when
converting strings to upper or lower case and Rust does not, so we disable
upper and lower by default. | false |
+| `spark.comet.convert.csv.enabled` | When enabled, data from Spark
(non-native) CSV v1 and v2 scans will be converted to Arrow format. | false |
+| `spark.comet.convert.json.enabled` | When enabled, data from Spark
(non-native) JSON v1 and v2 scans will be converted to Arrow format. | false |
+| `spark.comet.convert.parquet.enabled` | When enabled, data from Spark
(non-native) Parquet v1 and v2 scans will be converted to Arrow format. | false
|
+| `spark.comet.debug.enabled` | Whether to enable debug mode for Comet. When
enabled, Comet will do additional checks for debugging purpose. For example,
validating array when importing arrays from JVM at native side. Note that these
checks may be expensive in performance and should only be enabled for debugging
purpose. | false |
+| `spark.comet.enabled` | Whether to enable Comet extension for Spark. When
this is turned on, Spark will use Comet to read Parquet data source. Note that
to enable native vectorized execution, both this config and
`spark.comet.exec.enabled` need to be enabled. It can be overridden by the
environment variable `ENABLE_COMET`. | true |
+| `spark.comet.exceptionOnDatetimeRebase` | Whether to throw exception when
seeing dates/timestamps from the legacy hybrid (Julian + Gregorian) calendar.
Since Spark 3, dates/timestamps were written according to the Proleptic
Gregorian calendar. When this is true, Comet will throw exceptions when seeing
these dates/timestamps that were written by Spark version before 3.0. If this
is false, these dates/timestamps will be read as if they were written to the
Proleptic Gregorian calendar and [...]
+| `spark.comet.exec.columnarToRow.native.enabled` | Whether to enable native
columnar to row conversion. When enabled, Comet will use native Rust code to
convert Arrow columnar data to Spark UnsafeRow format instead of the JVM
implementation. This can improve performance for queries that need to convert
between columnar and row formats. | true |
+| `spark.comet.exec.enabled` | Whether to enable Comet native vectorized
execution for Spark. This controls whether Spark should convert operators into
their Comet counterparts and execute them in native space. Note: each operator
is associated with a separate config in the format of
`spark.comet.exec.<operator_name>.enabled` at the moment, and both the config
and this need to be turned on, in order for the operator to be executed in
native. | true |
+| `spark.comet.exec.replaceSortMergeJoin` | Experimental feature to force
Spark to replace SortMergeJoin with ShuffledHashJoin for improved performance.
This feature is not stable yet. For more information, refer to the [Comet
Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). |
false |
+| `spark.comet.exec.strictFloatingPoint` | When enabled, fall back to Spark
for floating-point operations that may differ from Spark, such as when
comparing or sorting -0.0 and 0.0. For more information, refer to the [Comet
Compatibility
Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). |
false |
+| `spark.comet.maxTempDirectorySize` | The maximum amount of data (in bytes)
stored inside the temporary directories. | 107374182400b |
+| `spark.comet.metrics.enabled` | Whether to enable Comet metrics reporting
through Spark's external monitoring system. When enabled, Comet exposes metrics
such as native operators, Spark operators, queries planned, transitions, and
acceleration ratio. These metrics can be visualized through tools like Grafana
when a metrics sink (e.g., Prometheus) is configured. Disabled by default
because Spark plan traversal adds overhead and metrics require a sink to be
useful. This config must be se [...]
+| `spark.comet.metrics.updateInterval` | The interval in milliseconds to
update metrics. If interval is negative, metrics will be updated upon task
completion. | 3000 |
+| `spark.comet.nativeLoadRequired` | Whether to require Comet native library
to load successfully when Comet is enabled. If not, Comet will silently
fallback to Spark when it fails to load the native lib. Otherwise, an error
will be thrown and the Spark job will be aborted. | false |
+| `spark.comet.operator.DataWritingCommandExec.allowIncompatible` | Whether to
allow incompatibility for operator: DataWritingCommandExec. False by default.
Can be overridden with
SPARK_COMET_OPERATOR_DATAWRITINGCOMMANDEXEC_ALLOWINCOMPATIBLE env variable It
can be overridden by the environment variable
`SPARK_COMET_OPERATOR_DATAWRITINGCOMMANDEXEC_ALLOWINCOMPATIBLE`. | false |
+| `spark.comet.sparkToColumnar.enabled` | Whether to enable Spark to Arrow
columnar conversion. When this is turned on, Comet will convert operators in
`spark.comet.sparkToColumnar.supportedOperatorList` into Arrow columnar format
before processing. | false |
+| `spark.comet.sparkToColumnar.supportedOperatorList` | A comma-separated list
of operators that will be converted to Arrow columnar format when
`spark.comet.sparkToColumnar.enabled` is true. |
Range,InMemoryTableScan,RDDScan,OneRowRelation |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Viewing Explain Plan & Fallback Reasons
@@ -41,34 +82,321 @@ Comet provides the following configuration settings.
These settings can be used to determine which parts of the plan are
accelerated by Comet and to see why some parts of the plan could not be
supported by Comet.
<!--BEGIN:CONFIG_TABLE[exec_explain]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.explain.format` | Choose extended explain output. The default
format of 'verbose' will provide the full query plan annotated with fallback
reasons as well as a summary of how much of the plan was accelerated by Comet.
The format 'fallback' provides a list of fallback reasons instead. | verbose |
+| `spark.comet.explain.native.enabled` | When this setting is enabled, Comet
will provide a tree representation of the native query plan before execution
and again after execution, with metrics. | false |
+| `spark.comet.explain.rules` | When this setting is enabled, Comet will log
all plan transformations performed in physical optimizer rules. Default: false
| false |
+| `spark.comet.explainFallback.enabled` | When this setting is enabled, Comet
will provide logging explaining the reason(s) why a query stage cannot be
executed natively. Set this to false to reduce the amount of logging. | false |
+| `spark.comet.logFallbackReasons.enabled` | When this setting is enabled,
Comet will log warnings for all fallback reasons. It can be overridden by the
environment variable `ENABLE_COMET_LOG_FALLBACK_REASONS`. | false |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Shuffle Configuration Settings
<!--BEGIN:CONFIG_TABLE[shuffle]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.columnar.shuffle.async.enabled` | Whether to enable
asynchronous shuffle for Arrow-based shuffle. | false |
+| `spark.comet.columnar.shuffle.async.max.thread.num` | Maximum number of
threads on an executor used for Comet async columnar shuffle. This is the upper
bound of total number of shuffle threads per executor. In other words, if the
number of cores * the number of shuffle threads per task
`spark.comet.columnar.shuffle.async.thread.num` is larger than this config.
Comet will use this config as the number of shuffle threads per executor
instead. | 100 |
+| `spark.comet.columnar.shuffle.async.thread.num` | Number of threads used for
Comet async columnar shuffle per shuffle task. Note that more threads means
more memory requirement to buffer shuffle data before flushing to disk. Also,
more threads may not always improve performance, and should be set based on the
number of cores available. | 3 |
+| `spark.comet.columnar.shuffle.batch.size` | Batch size when writing out
sorted spill files on the native side. Note that this should not be larger than
batch size (i.e., `spark.comet.batchSize`). Otherwise it will produce larger
batches than expected in the native operator after shuffle. | 8192 |
+| `spark.comet.exec.shuffle.compression.codec` | The codec of Comet native
shuffle used to compress shuffle data. lz4, zstd, and snappy are supported.
Compression can be disabled by setting spark.shuffle.compress=false. | lz4 |
+| `spark.comet.exec.shuffle.compression.zstd.level` | The compression level to
use when compressing shuffle files with zstd. | 1 |
+| `spark.comet.exec.shuffle.convertFromSparkPlan.enabled` | When enabled,
Comet will convert a Spark `ShuffleExchangeExec` to a Comet columnar shuffle
even when its child is a non-Comet (Spark) plan. Disable to leave such shuffles
as native Spark shuffles, restricting Comet shuffle to cases where the child is
already a Comet plan. | true |
+| `spark.comet.exec.shuffle.directRead.enabled` | When enabled, native
operators that consume shuffle output will read compressed shuffle blocks
directly in native code, bypassing Arrow FFI. Applies to both native shuffle
and JVM columnar shuffle. Requires spark.comet.exec.shuffle.enabled to be true.
| true |
+| `spark.comet.exec.shuffle.enabled` | Whether to enable Comet native shuffle.
Note that this requires setting `spark.shuffle.manager` to
`org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager`.
`spark.shuffle.manager` must be set before starting the Spark application and
cannot be changed during the application. | true |
+| `spark.comet.exec.shuffle.revertRedundantColumnar.enabled` | When enabled,
Comet reverts a `CometShuffleExchangeExec` with `CometColumnarShuffle` back to
Spark's `ShuffleExchangeExec` when both its parent and child are non-Comet hash
aggregate operators. This avoids a redundant row -> Arrow -> shuffle -> Arrow
-> row conversion when no Comet operator on either side can consume columnar
output. Disable to keep Comet columnar shuffle even in that case, which
preserves Comet's off-heap sh [...]
+| `spark.comet.exec.shuffle.writeBufferSize` | Size of the write buffer in
bytes used by the native shuffle writer when writing shuffle data to disk.
Larger values may improve write performance by reducing the number of system
calls, but will use more memory. The default is 1MB which provides a good
balance between performance and memory usage. | 1048576b |
+| `spark.comet.native.shuffle.partitioning.hash.enabled` | Whether to enable
hash partitioning for Comet native shuffle. | true |
+| `spark.comet.native.shuffle.partitioning.range.enabled` | Whether to enable
range partitioning for Comet native shuffle. | true |
+| `spark.comet.native.shuffle.partitioning.roundrobin.enabled` | Whether to
enable round robin partitioning for Comet native shuffle. This is disabled by
default because Comet's round-robin produces different partition assignments
than Spark. Spark sorts rows by their binary UnsafeRow representation before
assigning partitions, but Comet uses Arrow format which has a different binary
layout. Instead, Comet implements round-robin as hash partitioning on all
columns, which achieves the sam [...]
+| `spark.comet.native.shuffle.partitioning.roundrobin.maxHashColumns` | The
maximum number of columns to hash for round robin partitioning. When set to 0
(the default), all columns are hashed. When set to a positive value, only the
first N columns are used for hashing, which can improve performance for wide
tables while still providing reasonable distribution. | 0 |
+| `spark.comet.shuffle.preferDictionary.ratio` | The ratio of total values to
distinct values in a string column to decide whether to prefer dictionary
encoding when shuffling the column. If the ratio is higher than this config,
dictionary encoding will be used on shuffling string column. This config is
effective if it is higher than 1.0. Note that this config is only used when
`spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
+| `spark.comet.shuffle.sizeInBytesMultiplier` | Comet reports smaller sizes
for shuffle due to using Arrow's columnar memory format and this can result in
Spark choosing a different join strategy due to the estimated size of the
exchange being smaller. Comet will multiple sizeInBytes by this amount to avoid
regressions in join strategy. | 1.0 |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Memory & Tuning Configuration Settings
<!--BEGIN:CONFIG_TABLE[tuning]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.batchSize` | The columnar batch size, i.e., the maximum number
of rows that a batch can contain. | 8192 |
+| `spark.comet.exec.memoryPool` | The type of memory pool to be used for Comet
native execution when running Spark in off-heap mode. Available pool types are
`greedy_unified` and `fair_unified`. For more information, refer to the [Comet
Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). |
fair_unified |
+| `spark.comet.exec.memoryPool.fraction` | Fraction of off-heap memory pool
that is available to Comet. Only applies to off-heap mode. For more
information, refer to the [Comet Tuning
Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
+| `spark.comet.tracing.enabled` | Enable fine-grained tracing of events and
memory usage. For more information, refer to the [Comet Tracing
Guide](https://datafusion.apache.org/comet/contributor-guide/tracing.html). |
false |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Development & Testing Settings
<!--BEGIN:CONFIG_TABLE[testing]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.columnar.shuffle.memory.factor` | Fraction of Comet memory to
be allocated per executor process for columnar shuffle when running in on-heap
mode. For more information, refer to the [Comet Tuning
Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | 1.0 |
+| `spark.comet.debug.memory` | When enabled, log all native memory pool
interactions. For more information, refer to the Comet Debugging Guide
(https://datafusion.apache.org/comet/contributor-guide/debugging.html). | false
|
+| `spark.comet.exec.onHeap.enabled` | Whether to allow Comet to run in on-heap
mode. Required for running Spark SQL tests. It can be overridden by the
environment variable `ENABLE_COMET_ONHEAP`. | false |
+| `spark.comet.exec.onHeap.memoryPool` | The type of memory pool to be used
for Comet native execution when running Spark in on-heap mode. Available pool
types are `greedy`, `fair_spill`, `greedy_task_shared`,
`fair_spill_task_shared`, `greedy_global`, `fair_spill_global`, and
`unbounded`. | greedy_task_shared |
+| `spark.comet.exec.respectDataFusionConfigs` | Development and testing
configuration option to allow DataFusion configs set in Spark configuration
settings starting with `spark.comet.datafusion.` to be passed into native
execution. | false |
+| `spark.comet.memoryOverhead` | The amount of additional memory to be
allocated per executor process for Comet, in MiB, when running Spark in on-heap
mode. | 1024 MiB |
+| `spark.comet.parquet.write.enabled` | Whether to enable native Parquet write
through Comet. When enabled, Comet will intercept Parquet write operations and
execute them natively. This feature is highly experimental and only partially
implemented. It should not be used in production. It can be overridden by the
environment variable `ENABLE_COMET_WRITE`. | false |
+| `spark.comet.scan.csv.v2.enabled` | Whether to use the native Comet V2 CSV
reader for improved performance. Default: false (uses standard Spark CSV
reader) Experimental: Performance benefits are workload-dependent. | false |
+| `spark.comet.scan.enabled` | Whether to enable native scans. Intended for
use in Comet's own test suites to selectively disable native scans; not
intended for production use. | true |
+| `spark.comet.testing.strict` | Experimental option to enable strict testing,
which will fail tests that could be more comprehensive, such as checking for a
specific fallback reason. It can be overridden by the environment variable
`ENABLE_COMET_STRICT_TESTING`. | false |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Enabling or Disabling Individual Operators
<!--BEGIN:CONFIG_TABLE[enable_exec]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.exec.aggregate.enabled` | Whether to enable aggregate by
default. | true |
+| `spark.comet.exec.broadcastExchange.enabled` | Whether to enable
broadcastExchange by default. | true |
+| `spark.comet.exec.broadcastHashJoin.enabled` | Whether to enable
broadcastHashJoin by default. | true |
+| `spark.comet.exec.coalesce.enabled` | Whether to enable coalesce by default.
| true |
+| `spark.comet.exec.collectLimit.enabled` | Whether to enable collectLimit by
default. | true |
+| `spark.comet.exec.expand.enabled` | Whether to enable expand by default. |
true |
+| `spark.comet.exec.explode.enabled` | Whether to enable explode by default. |
true |
+| `spark.comet.exec.filter.enabled` | Whether to enable filter by default. |
true |
+| `spark.comet.exec.globalLimit.enabled` | Whether to enable globalLimit by
default. | true |
+| `spark.comet.exec.hashJoin.enabled` | Whether to enable hashJoin by default.
| true |
+| `spark.comet.exec.localLimit.enabled` | Whether to enable localLimit by
default. | true |
+| `spark.comet.exec.localTableScan.enabled` | Whether to enable localTableScan
by default. | false |
+| `spark.comet.exec.project.enabled` | Whether to enable project by default. |
true |
+| `spark.comet.exec.sort.enabled` | Whether to enable sort by default. | true |
+| `spark.comet.exec.sortMergeJoin.enabled` | Whether to enable sortMergeJoin
by default. | true |
+| `spark.comet.exec.sortMergeJoinWithJoinFilter.enabled` | Experimental
support for Sort Merge Join with filter | false |
+| `spark.comet.exec.takeOrderedAndProject.enabled` | Whether to enable
takeOrderedAndProject by default. | true |
+| `spark.comet.exec.union.enabled` | Whether to enable union by default. |
true |
+| `spark.comet.exec.window.enabled` | Whether to enable window by default. |
true |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Enabling or Disabling Individual Scalar Expressions
<!--BEGIN:CONFIG_TABLE[enable_expr]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.expression.Abs.enabled` | Enable Comet acceleration for `Abs` |
true |
+| `spark.comet.expression.Acos.enabled` | Enable Comet acceleration for `Acos`
| true |
+| `spark.comet.expression.Acosh.enabled` | Enable Comet acceleration for
`Acosh` | true |
+| `spark.comet.expression.Add.enabled` | Enable Comet acceleration for `Add` |
true |
+| `spark.comet.expression.Alias.enabled` | Enable Comet acceleration for
`Alias` | true |
+| `spark.comet.expression.And.enabled` | Enable Comet acceleration for `And` |
true |
+| `spark.comet.expression.ArrayAppend.enabled` | Enable Comet acceleration for
`ArrayAppend` | true |
+| `spark.comet.expression.ArrayCompact.enabled` | Enable Comet acceleration
for `ArrayCompact` | true |
+| `spark.comet.expression.ArrayContains.enabled` | Enable Comet acceleration
for `ArrayContains` | true |
+| `spark.comet.expression.ArrayDistinct.enabled` | Enable Comet acceleration
for `ArrayDistinct` | true |
+| `spark.comet.expression.ArrayExcept.enabled` | Enable Comet acceleration for
`ArrayExcept` | true |
+| `spark.comet.expression.ArrayFilter.enabled` | Enable Comet acceleration for
`ArrayFilter` | true |
+| `spark.comet.expression.ArrayInsert.enabled` | Enable Comet acceleration for
`ArrayInsert` | true |
+| `spark.comet.expression.ArrayIntersect.enabled` | Enable Comet acceleration
for `ArrayIntersect` | true |
+| `spark.comet.expression.ArrayJoin.enabled` | Enable Comet acceleration for
`ArrayJoin` | true |
+| `spark.comet.expression.ArrayMax.enabled` | Enable Comet acceleration for
`ArrayMax` | true |
+| `spark.comet.expression.ArrayMin.enabled` | Enable Comet acceleration for
`ArrayMin` | true |
+| `spark.comet.expression.ArrayPosition.enabled` | Enable Comet acceleration
for `ArrayPosition` | true |
+| `spark.comet.expression.ArrayRemove.enabled` | Enable Comet acceleration for
`ArrayRemove` | true |
+| `spark.comet.expression.ArrayRepeat.enabled` | Enable Comet acceleration for
`ArrayRepeat` | true |
+| `spark.comet.expression.ArrayUnion.enabled` | Enable Comet acceleration for
`ArrayUnion` | true |
+| `spark.comet.expression.ArraysOverlap.enabled` | Enable Comet acceleration
for `ArraysOverlap` | true |
+| `spark.comet.expression.ArraysZip.enabled` | Enable Comet acceleration for
`ArraysZip` | true |
+| `spark.comet.expression.Ascii.enabled` | Enable Comet acceleration for
`Ascii` | true |
+| `spark.comet.expression.Asin.enabled` | Enable Comet acceleration for `Asin`
| true |
+| `spark.comet.expression.Asinh.enabled` | Enable Comet acceleration for
`Asinh` | true |
+| `spark.comet.expression.Atan.enabled` | Enable Comet acceleration for `Atan`
| true |
+| `spark.comet.expression.Atan2.enabled` | Enable Comet acceleration for
`Atan2` | true |
+| `spark.comet.expression.Atanh.enabled` | Enable Comet acceleration for
`Atanh` | true |
+| `spark.comet.expression.AttributeReference.enabled` | Enable Comet
acceleration for `AttributeReference` | true |
+| `spark.comet.expression.Bin.enabled` | Enable Comet acceleration for `Bin` |
true |
+| `spark.comet.expression.BitLength.enabled` | Enable Comet acceleration for
`BitLength` | true |
+| `spark.comet.expression.BitwiseAnd.enabled` | Enable Comet acceleration for
`BitwiseAnd` | true |
+| `spark.comet.expression.BitwiseCount.enabled` | Enable Comet acceleration
for `BitwiseCount` | true |
+| `spark.comet.expression.BitwiseGet.enabled` | Enable Comet acceleration for
`BitwiseGet` | true |
+| `spark.comet.expression.BitwiseNot.enabled` | Enable Comet acceleration for
`BitwiseNot` | true |
+| `spark.comet.expression.BitwiseOr.enabled` | Enable Comet acceleration for
`BitwiseOr` | true |
+| `spark.comet.expression.BitwiseXor.enabled` | Enable Comet acceleration for
`BitwiseXor` | true |
+| `spark.comet.expression.BloomFilterMightContain.enabled` | Enable Comet
acceleration for `BloomFilterMightContain` | true |
+| `spark.comet.expression.CaseWhen.enabled` | Enable Comet acceleration for
`CaseWhen` | true |
+| `spark.comet.expression.Cast.enabled` | Enable Comet acceleration for `Cast`
| true |
+| `spark.comet.expression.Cbrt.enabled` | Enable Comet acceleration for `Cbrt`
| true |
+| `spark.comet.expression.Ceil.enabled` | Enable Comet acceleration for `Ceil`
| true |
+| `spark.comet.expression.CheckOverflow.enabled` | Enable Comet acceleration
for `CheckOverflow` | true |
+| `spark.comet.expression.Chr.enabled` | Enable Comet acceleration for `Chr` |
true |
+| `spark.comet.expression.Coalesce.enabled` | Enable Comet acceleration for
`Coalesce` | true |
+| `spark.comet.expression.Concat.enabled` | Enable Comet acceleration for
`Concat` | true |
+| `spark.comet.expression.ConcatWs.enabled` | Enable Comet acceleration for
`ConcatWs` | true |
+| `spark.comet.expression.Contains.enabled` | Enable Comet acceleration for
`Contains` | true |
+| `spark.comet.expression.Cos.enabled` | Enable Comet acceleration for `Cos` |
true |
+| `spark.comet.expression.Cosh.enabled` | Enable Comet acceleration for `Cosh`
| true |
+| `spark.comet.expression.Cot.enabled` | Enable Comet acceleration for `Cot` |
true |
+| `spark.comet.expression.Crc32.enabled` | Enable Comet acceleration for
`Crc32` | true |
+| `spark.comet.expression.CreateArray.enabled` | Enable Comet acceleration for
`CreateArray` | true |
+| `spark.comet.expression.CreateNamedStruct.enabled` | Enable Comet
acceleration for `CreateNamedStruct` | true |
+| `spark.comet.expression.DateAdd.enabled` | Enable Comet acceleration for
`DateAdd` | true |
+| `spark.comet.expression.DateDiff.enabled` | Enable Comet acceleration for
`DateDiff` | true |
+| `spark.comet.expression.DateFormatClass.enabled` | Enable Comet acceleration
for `DateFormatClass` | true |
+| `spark.comet.expression.DateFromUnixDate.enabled` | Enable Comet
acceleration for `DateFromUnixDate` | true |
+| `spark.comet.expression.DateSub.enabled` | Enable Comet acceleration for
`DateSub` | true |
+| `spark.comet.expression.DayOfMonth.enabled` | Enable Comet acceleration for
`DayOfMonth` | true |
+| `spark.comet.expression.DayOfWeek.enabled` | Enable Comet acceleration for
`DayOfWeek` | true |
+| `spark.comet.expression.DayOfYear.enabled` | Enable Comet acceleration for
`DayOfYear` | true |
+| `spark.comet.expression.Days.enabled` | Enable Comet acceleration for `Days`
| true |
+| `spark.comet.expression.Divide.enabled` | Enable Comet acceleration for
`Divide` | true |
+| `spark.comet.expression.ElementAt.enabled` | Enable Comet acceleration for
`ElementAt` | true |
+| `spark.comet.expression.EndsWith.enabled` | Enable Comet acceleration for
`EndsWith` | true |
+| `spark.comet.expression.EqualNullSafe.enabled` | Enable Comet acceleration
for `EqualNullSafe` | true |
+| `spark.comet.expression.EqualTo.enabled` | Enable Comet acceleration for
`EqualTo` | true |
+| `spark.comet.expression.Exp.enabled` | Enable Comet acceleration for `Exp` |
true |
+| `spark.comet.expression.Expm1.enabled` | Enable Comet acceleration for
`Expm1` | true |
+| `spark.comet.expression.Flatten.enabled` | Enable Comet acceleration for
`Flatten` | true |
+| `spark.comet.expression.Floor.enabled` | Enable Comet acceleration for
`Floor` | true |
+| `spark.comet.expression.FromUnixTime.enabled` | Enable Comet acceleration
for `FromUnixTime` | true |
+| `spark.comet.expression.GetArrayItem.enabled` | Enable Comet acceleration
for `GetArrayItem` | true |
+| `spark.comet.expression.GetArrayStructFields.enabled` | Enable Comet
acceleration for `GetArrayStructFields` | true |
+| `spark.comet.expression.GetJsonObject.enabled` | Enable Comet acceleration
for `GetJsonObject` | true |
+| `spark.comet.expression.GetMapValue.enabled` | Enable Comet acceleration for
`GetMapValue` | true |
+| `spark.comet.expression.GetStructField.enabled` | Enable Comet acceleration
for `GetStructField` | true |
+| `spark.comet.expression.GreaterThan.enabled` | Enable Comet acceleration for
`GreaterThan` | true |
+| `spark.comet.expression.GreaterThanOrEqual.enabled` | Enable Comet
acceleration for `GreaterThanOrEqual` | true |
+| `spark.comet.expression.Hex.enabled` | Enable Comet acceleration for `Hex` |
true |
+| `spark.comet.expression.Hour.enabled` | Enable Comet acceleration for `Hour`
| true |
+| `spark.comet.expression.Hours.enabled` | Enable Comet acceleration for
`Hours` | true |
+| `spark.comet.expression.If.enabled` | Enable Comet acceleration for `If` |
true |
+| `spark.comet.expression.In.enabled` | Enable Comet acceleration for `In` |
true |
+| `spark.comet.expression.InSet.enabled` | Enable Comet acceleration for
`InSet` | true |
+| `spark.comet.expression.InitCap.enabled` | Enable Comet acceleration for
`InitCap` | true |
+| `spark.comet.expression.IntegralDivide.enabled` | Enable Comet acceleration
for `IntegralDivide` | true |
+| `spark.comet.expression.IsNaN.enabled` | Enable Comet acceleration for
`IsNaN` | true |
+| `spark.comet.expression.IsNotNull.enabled` | Enable Comet acceleration for
`IsNotNull` | true |
+| `spark.comet.expression.IsNull.enabled` | Enable Comet acceleration for
`IsNull` | true |
+| `spark.comet.expression.JsonToStructs.enabled` | Enable Comet acceleration
for `JsonToStructs` | true |
+| `spark.comet.expression.KnownFloatingPointNormalized.enabled` | Enable Comet
acceleration for `KnownFloatingPointNormalized` | true |
+| `spark.comet.expression.LastDay.enabled` | Enable Comet acceleration for
`LastDay` | true |
+| `spark.comet.expression.Left.enabled` | Enable Comet acceleration for `Left`
| true |
+| `spark.comet.expression.Length.enabled` | Enable Comet acceleration for
`Length` | true |
+| `spark.comet.expression.LessThan.enabled` | Enable Comet acceleration for
`LessThan` | true |
+| `spark.comet.expression.LessThanOrEqual.enabled` | Enable Comet acceleration
for `LessThanOrEqual` | true |
+| `spark.comet.expression.Like.enabled` | Enable Comet acceleration for `Like`
| true |
+| `spark.comet.expression.Literal.enabled` | Enable Comet acceleration for
`Literal` | true |
+| `spark.comet.expression.Log.enabled` | Enable Comet acceleration for `Log` |
true |
+| `spark.comet.expression.Log10.enabled` | Enable Comet acceleration for
`Log10` | true |
+| `spark.comet.expression.Log2.enabled` | Enable Comet acceleration for `Log2`
| true |
+| `spark.comet.expression.Logarithm.enabled` | Enable Comet acceleration for
`Logarithm` | true |
+| `spark.comet.expression.Lower.enabled` | Enable Comet acceleration for
`Lower` | true |
+| `spark.comet.expression.MakeDate.enabled` | Enable Comet acceleration for
`MakeDate` | true |
+| `spark.comet.expression.MakeDecimal.enabled` | Enable Comet acceleration for
`MakeDecimal` | true |
+| `spark.comet.expression.MapContainsKey.enabled` | Enable Comet acceleration
for `MapContainsKey` | true |
+| `spark.comet.expression.MapEntries.enabled` | Enable Comet acceleration for
`MapEntries` | true |
+| `spark.comet.expression.MapFromArrays.enabled` | Enable Comet acceleration
for `MapFromArrays` | true |
+| `spark.comet.expression.MapFromEntries.enabled` | Enable Comet acceleration
for `MapFromEntries` | true |
+| `spark.comet.expression.MapKeys.enabled` | Enable Comet acceleration for
`MapKeys` | true |
+| `spark.comet.expression.MapValues.enabled` | Enable Comet acceleration for
`MapValues` | true |
+| `spark.comet.expression.Md5.enabled` | Enable Comet acceleration for `Md5` |
true |
+| `spark.comet.expression.Minute.enabled` | Enable Comet acceleration for
`Minute` | true |
+| `spark.comet.expression.MonotonicallyIncreasingID.enabled` | Enable Comet
acceleration for `MonotonicallyIncreasingID` | true |
+| `spark.comet.expression.Month.enabled` | Enable Comet acceleration for
`Month` | true |
+| `spark.comet.expression.Multiply.enabled` | Enable Comet acceleration for
`Multiply` | true |
+| `spark.comet.expression.Murmur3Hash.enabled` | Enable Comet acceleration for
`Murmur3Hash` | true |
+| `spark.comet.expression.NextDay.enabled` | Enable Comet acceleration for
`NextDay` | true |
+| `spark.comet.expression.Not.enabled` | Enable Comet acceleration for `Not` |
true |
+| `spark.comet.expression.OctetLength.enabled` | Enable Comet acceleration for
`OctetLength` | true |
+| `spark.comet.expression.Or.enabled` | Enable Comet acceleration for `Or` |
true |
+| `spark.comet.expression.Pi.enabled` | Enable Comet acceleration for `Pi` |
true |
+| `spark.comet.expression.Pow.enabled` | Enable Comet acceleration for `Pow` |
true |
+| `spark.comet.expression.Quarter.enabled` | Enable Comet acceleration for
`Quarter` | true |
+| `spark.comet.expression.RLike.enabled` | Enable Comet acceleration for
`RLike` | true |
+| `spark.comet.expression.Rand.enabled` | Enable Comet acceleration for `Rand`
| true |
+| `spark.comet.expression.Randn.enabled` | Enable Comet acceleration for
`Randn` | true |
+| `spark.comet.expression.RegExpReplace.enabled` | Enable Comet acceleration
for `RegExpReplace` | true |
+| `spark.comet.expression.Remainder.enabled` | Enable Comet acceleration for
`Remainder` | true |
+| `spark.comet.expression.Reverse.enabled` | Enable Comet acceleration for
`Reverse` | true |
+| `spark.comet.expression.Right.enabled` | Enable Comet acceleration for
`Right` | true |
+| `spark.comet.expression.Round.enabled` | Enable Comet acceleration for
`Round` | true |
+| `spark.comet.expression.ScalarSubquery.enabled` | Enable Comet acceleration
for `ScalarSubquery` | true |
+| `spark.comet.expression.Second.enabled` | Enable Comet acceleration for
`Second` | true |
+| `spark.comet.expression.SecondsToTimestamp.enabled` | Enable Comet
acceleration for `SecondsToTimestamp` | true |
+| `spark.comet.expression.Sha1.enabled` | Enable Comet acceleration for `Sha1`
| true |
+| `spark.comet.expression.Sha2.enabled` | Enable Comet acceleration for `Sha2`
| true |
+| `spark.comet.expression.ShiftLeft.enabled` | Enable Comet acceleration for
`ShiftLeft` | true |
+| `spark.comet.expression.ShiftRight.enabled` | Enable Comet acceleration for
`ShiftRight` | true |
+| `spark.comet.expression.Signum.enabled` | Enable Comet acceleration for
`Signum` | true |
+| `spark.comet.expression.Sin.enabled` | Enable Comet acceleration for `Sin` |
true |
+| `spark.comet.expression.Sinh.enabled` | Enable Comet acceleration for `Sinh`
| true |
+| `spark.comet.expression.Size.enabled` | Enable Comet acceleration for `Size`
| true |
+| `spark.comet.expression.SortArray.enabled` | Enable Comet acceleration for
`SortArray` | true |
+| `spark.comet.expression.SortOrder.enabled` | Enable Comet acceleration for
`SortOrder` | true |
+| `spark.comet.expression.SparkPartitionID.enabled` | Enable Comet
acceleration for `SparkPartitionID` | true |
+| `spark.comet.expression.Sqrt.enabled` | Enable Comet acceleration for `Sqrt`
| true |
+| `spark.comet.expression.StartsWith.enabled` | Enable Comet acceleration for
`StartsWith` | true |
+| `spark.comet.expression.StaticInvoke.enabled` | Enable Comet acceleration
for `StaticInvoke` | true |
+| `spark.comet.expression.StringInstr.enabled` | Enable Comet acceleration for
`StringInstr` | true |
+| `spark.comet.expression.StringLPad.enabled` | Enable Comet acceleration for
`StringLPad` | true |
+| `spark.comet.expression.StringRPad.enabled` | Enable Comet acceleration for
`StringRPad` | true |
+| `spark.comet.expression.StringRepeat.enabled` | Enable Comet acceleration
for `StringRepeat` | true |
+| `spark.comet.expression.StringReplace.enabled` | Enable Comet acceleration
for `StringReplace` | true |
+| `spark.comet.expression.StringSpace.enabled` | Enable Comet acceleration for
`StringSpace` | true |
+| `spark.comet.expression.StringSplit.enabled` | Enable Comet acceleration for
`StringSplit` | true |
+| `spark.comet.expression.StringToMap.enabled` | Enable Comet acceleration for
`StringToMap` | true |
+| `spark.comet.expression.StringTranslate.enabled` | Enable Comet acceleration
for `StringTranslate` | true |
+| `spark.comet.expression.StringTrim.enabled` | Enable Comet acceleration for
`StringTrim` | true |
+| `spark.comet.expression.StringTrimBoth.enabled` | Enable Comet acceleration
for `StringTrimBoth` | true |
+| `spark.comet.expression.StringTrimLeft.enabled` | Enable Comet acceleration
for `StringTrimLeft` | true |
+| `spark.comet.expression.StringTrimRight.enabled` | Enable Comet acceleration
for `StringTrimRight` | true |
+| `spark.comet.expression.StructsToCsv.enabled` | Enable Comet acceleration
for `StructsToCsv` | true |
+| `spark.comet.expression.StructsToJson.enabled` | Enable Comet acceleration
for `StructsToJson` | true |
+| `spark.comet.expression.Substring.enabled` | Enable Comet acceleration for
`Substring` | true |
+| `spark.comet.expression.Subtract.enabled` | Enable Comet acceleration for
`Subtract` | true |
+| `spark.comet.expression.Tan.enabled` | Enable Comet acceleration for `Tan` |
true |
+| `spark.comet.expression.Tanh.enabled` | Enable Comet acceleration for `Tanh`
| true |
+| `spark.comet.expression.ToDegrees.enabled` | Enable Comet acceleration for
`ToDegrees` | true |
+| `spark.comet.expression.ToRadians.enabled` | Enable Comet acceleration for
`ToRadians` | true |
+| `spark.comet.expression.TruncDate.enabled` | Enable Comet acceleration for
`TruncDate` | true |
+| `spark.comet.expression.TruncTimestamp.enabled` | Enable Comet acceleration
for `TruncTimestamp` | true |
+| `spark.comet.expression.UnaryMinus.enabled` | Enable Comet acceleration for
`UnaryMinus` | true |
+| `spark.comet.expression.Unhex.enabled` | Enable Comet acceleration for
`Unhex` | true |
+| `spark.comet.expression.UnixDate.enabled` | Enable Comet acceleration for
`UnixDate` | true |
+| `spark.comet.expression.UnixTimestamp.enabled` | Enable Comet acceleration
for `UnixTimestamp` | true |
+| `spark.comet.expression.UnscaledValue.enabled` | Enable Comet acceleration
for `UnscaledValue` | true |
+| `spark.comet.expression.Upper.enabled` | Enable Comet acceleration for
`Upper` | true |
+| `spark.comet.expression.WeekDay.enabled` | Enable Comet acceleration for
`WeekDay` | true |
+| `spark.comet.expression.WeekOfYear.enabled` | Enable Comet acceleration for
`WeekOfYear` | true |
+| `spark.comet.expression.XxHash64.enabled` | Enable Comet acceleration for
`XxHash64` | true |
+| `spark.comet.expression.Year.enabled` | Enable Comet acceleration for `Year`
| true |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
## Enabling or Disabling Individual Aggregate Expressions
<!--BEGIN:CONFIG_TABLE[enable_agg_expr]-->
+<!-- prettier-ignore-start -->
+| Config | Description | Default Value |
+|--------|-------------|---------------|
+| `spark.comet.expression.Average.enabled` | Enable Comet acceleration for
`Average` | true |
+| `spark.comet.expression.BitAndAgg.enabled` | Enable Comet acceleration for
`BitAndAgg` | true |
+| `spark.comet.expression.BitOrAgg.enabled` | Enable Comet acceleration for
`BitOrAgg` | true |
+| `spark.comet.expression.BitXorAgg.enabled` | Enable Comet acceleration for
`BitXorAgg` | true |
+| `spark.comet.expression.BloomFilterAggregate.enabled` | Enable Comet
acceleration for `BloomFilterAggregate` | true |
+| `spark.comet.expression.CollectSet.enabled` | Enable Comet acceleration for
`CollectSet` | true |
+| `spark.comet.expression.Corr.enabled` | Enable Comet acceleration for `Corr`
| true |
+| `spark.comet.expression.Count.enabled` | Enable Comet acceleration for
`Count` | true |
+| `spark.comet.expression.CovPopulation.enabled` | Enable Comet acceleration
for `CovPopulation` | true |
+| `spark.comet.expression.CovSample.enabled` | Enable Comet acceleration for
`CovSample` | true |
+| `spark.comet.expression.First.enabled` | Enable Comet acceleration for
`First` | true |
+| `spark.comet.expression.Last.enabled` | Enable Comet acceleration for `Last`
| true |
+| `spark.comet.expression.Max.enabled` | Enable Comet acceleration for `Max` |
true |
+| `spark.comet.expression.Min.enabled` | Enable Comet acceleration for `Min` |
true |
+| `spark.comet.expression.StddevPop.enabled` | Enable Comet acceleration for
`StddevPop` | true |
+| `spark.comet.expression.StddevSamp.enabled` | Enable Comet acceleration for
`StddevSamp` | true |
+| `spark.comet.expression.Sum.enabled` | Enable Comet acceleration for `Sum` |
true |
+| `spark.comet.expression.VariancePop.enabled` | Enable Comet acceleration for
`VariancePop` | true |
+| `spark.comet.expression.VarianceSamp.enabled` | Enable Comet acceleration
for `VarianceSamp` | true |
+<!-- prettier-ignore-end -->
<!--END:CONFIG_TABLE-->
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]