andygrove commented on code in PR #4550:
URL: https://github.com/apache/datafusion-comet/pull/4550#discussion_r3337283368
##########
docs/source/user-guide/latest/expressions.md:
##########
@@ -17,354 +17,652 @@
under the License.
-->
-# Supported Spark Expressions
-
-Comet supports the following Spark expressions. See the [Comet Compatibility
Guide] for details on known
-incompatibilities and unsupported cases.
-
-All expressions are enabled by default, but most can be disabled by setting
-`spark.comet.expression.EXPRNAME.enabled=false`, where `EXPRNAME` is the
expression name as specified in
-the following tables, such as `Length`, or `StartsWith`. See the [Comet
Configuration Guide] for a full list
-of expressions that be disabled.
-
-## Conditional Expressions
-
-| Expression | SQL |
-| ---------- | ------------------------------------------- |
-| CaseWhen | `CASE WHEN expr THEN expr ELSE expr END` |
-| If | `IF(predicate_expr, true_expr, false_expr)` |
-
-## Predicate Expressions
-
-| Expression | SQL |
-| ------------------ | ------------- |
-| And | `AND` |
-| EqualTo | `=` |
-| EqualNullSafe | `<=>` |
-| GreaterThan | `>` |
-| GreaterThanOrEqual | `>=` |
-| ILike | `ILIKE` |
-| In | `IN` |
-| InSet | `IN (...)` |
-| IsNotNull | `IS NOT NULL` |
-| IsNull | `IS NULL` |
-| LessThan | `<` |
-| LessThanOrEqual | `<=` |
-| Not | `NOT` |
-| Or | `OR` |
-
-## String Functions
-
-| Expression |
-| --------------- |
-| Ascii |
-| BitLength |
-| Chr |
-| Concat |
-| ConcatWs |
-| Contains |
-| Decode |
-| EndsWith |
-| InitCap |
-| Left |
-| Length |
-| Like |
-| Lower |
-| OctetLength |
-| Reverse |
-| Right |
-| RLike |
-| Split |
-| StartsWith |
-| StringInstr |
-| StringRepeat |
-| StringReplace |
-| StringLPad |
-| StringRPad |
-| StringSpace |
-| StringTranslate |
-| StringTrim |
-| StringTrimBoth |
-| StringTrimLeft |
-| StringTrimRight |
-| Substring |
-| SubstringIndex |
-| Upper |
-
-## JSON Functions
-
-| Expression |
-| ------------- |
-| GetJsonObject |
-
-## Date/Time Functions
-
-| Expression | SQL |
-| ----------------- | ---------------------------- |
-| AddMonths | `add_months` |
-| ConvertTimezone | `convert_timezone` |
-| CurrentTimeZone | `current_timezone` |
-| DateAdd | `date_add` |
-| DateDiff | `datediff` |
-| DateFormat | `date_format` |
-| DateFromUnixDate | `date_from_unix_date` |
-| DateSub | `date_sub` |
-| DatePart | `date_part(field, source)` |
-| Days | `days` |
-| Extract | `extract(field FROM source)` |
-| FromUnixTime | `from_unixtime` |
-| Hour | `hour` |
-| LastDay | `last_day` |
-| LocalTimestamp | `localtimestamp` |
-| MakeDate | `make_date` |
-| MakeTime | `make_time` |
-| MakeTimestamp | `make_timestamp` |
-| MicrosToTimestamp | `timestamp_micros` |
-| MillisToTimestamp | `timestamp_millis` |
-| Minute | `minute` |
-| MonthsBetween | `months_between` |
-| NextDay | `next_day` |
-| Second | `second` |
-| TimestampSeconds | `timestamp_seconds` |
-| ToUnixTimestamp | `to_unix_timestamp` |
-| TruncDate | `trunc` |
-| TruncTimestamp | `date_trunc` |
-| UnixDate | `unix_date` |
-| UnixMicros | `unix_micros` |
-| UnixMillis | `unix_millis` |
-| UnixSeconds | `unix_seconds` |
-| UnixTimestamp | `unix_timestamp` |
-| Year | `year` |
-| Month | `month` |
-| DayOfMonth | `day`/`dayofmonth` |
-| DayOfWeek | `dayofweek` |
-| WeekDay | `weekday` |
-| DayOfYear | `dayofyear` |
-| WeekOfYear | `weekofyear` |
-| Quarter | `quarter` |
-| ToTime | `to_time` |
-| TryToTime | `try_to_time` |
-
-## Math Expressions
-
-| Expression | SQL |
-| -------------- | -------------- |
-| Abs | `abs` |
-| Acos | `acos` |
-| Acosh | `acosh` |
-| Add | `+` |
-| Asin | `asin` |
-| Asinh | `asinh` |
-| Atan | `atan` |
-| Atan2 | `atan2` |
-| Atanh | `atanh` |
-| Bin | `bin` |
-| BRound | `bround` |
-| Cbrt | `cbrt` |
-| Ceil | `ceil` |
-| Cos | `cos` |
-| Cosh | `cosh` |
-| Cot | `cot` |
-| Csc | `csc` |
-| Divide | `/` |
-| Exp | `exp` |
-| Expm1 | `expm1` |
-| Factorial | `factorial` |
-| Floor | `floor` |
-| Hex | `hex` |
-| IntegralDivide | `div` |
-| IsNaN | `isnan` |
-| Log | `log` |
-| Log2 | `log2` |
-| Log10 | `log10` |
-| Multiply | `*` |
-| Pi | `pi` |
-| Pow | `power` |
-| Rand | `rand` |
-| Randn | `randn` |
-| Remainder | `%` |
-| Rint | `rint` |
-| Round | `round` |
-| Sec | `sec` |
-| Signum | `signum` |
-| Sin | `sin` |
-| Sinh | `sinh` |
-| Sqrt | `sqrt` |
-| Subtract | `-` |
-| Tan | `tan` |
-| Tanh | `tanh` |
-| ToDegrees | `degrees` |
-| ToRadians | `radians` |
-| TryAdd | `try_add` |
-| TryDivide | `try_div` |
-| TryMultiply | `try_mul` |
-| TrySubtract | `try_sub` |
-| UnaryMinus | `-` |
-| Unhex | `unhex` |
-| WidthBucket | `width_bucket` |
-
-## Hashing Functions
-
-| Expression |
-| ----------- |
-| Crc32 |
-| Md5 |
-| Murmur3Hash |
-| Sha1 |
-| Sha2 |
-| XxHash64 |
-
-## Bitwise Expressions
-
-| Expression | SQL |
-| ------------------ | ----- |
-| BitwiseAnd | `&` |
-| BitwiseCount | |
-| BitwiseGet | |
-| BitwiseOr | `\|` |
-| BitwiseNot | `~` |
-| BitwiseXor | `^` |
-| ShiftLeft | `<<` |
-| ShiftRight | `>>` |
-| ShiftRightUnsigned | `>>>` |
-
-## Aggregate Expressions
-
-| Expression | SQL |
-| ------------- | ---------- |
-| Average | |
-| BitAndAgg | |
-| BitOrAgg | |
-| BitXorAgg | |
-| BoolAnd | `bool_and` |
-| BoolOr | `bool_or` |
-| CollectSet | |
-| Corr | |
-| Count | |
-| CountIf | `count_if` |
-| CovPopulation | |
-| CovSample | |
-| First | |
-| Last | |
-| Max | |
-| Min | |
-| StddevPop | |
-| StddevSamp | |
-| Sum | |
-| VariancePop | |
-| VarianceSamp | |
-
-## Window Functions
-
-```{warning}
-Window support is disabled by default due to known correctness issues.
Tracking issue: [#2721](https://github.com/apache/datafusion-comet/issues/2721).
-```
-
-Comet supports using the following aggregate functions within window contexts
with PARTITION BY and ORDER BY clauses.
-
-| Expression |
-| ---------- |
-| Count |
-| Max |
-| Min |
-| Sum |
-
-**Note:** Dedicated window functions such as `rank`, `dense_rank`,
`row_number`, `lag`, `lead`, `ntile`, `cume_dist`, `percent_rank`, and
`nth_value` are not currently supported and will fall back to Spark.
-
-## Array Expressions
-
-| Expression |
-| -------------- |
-| ArrayAppend |
-| ArrayCompact |
-| ArrayContains |
-| ArrayDistinct |
-| ArrayExcept |
-| ArrayFilter |
-| ArrayInsert |
-| ArrayIntersect |
-| ArrayJoin |
-| ArrayMax |
-| ArrayMin |
-| ArrayPosition |
-| ArrayRemove |
-| ArrayRepeat |
-| ArraysZip |
-| ArrayUnion |
-| ArraysOverlap |
-| CreateArray |
-| ElementAt |
-| Flatten |
-| GetArrayItem |
-| Size |
-| SortArray |
-
-## Map Expressions
-
-| Expression |
-| -------------- |
-| GetMapValue |
-| MapContainsKey |
-| MapEntries |
-| MapFromArrays |
-| MapFromEntries |
-| MapKeys |
-| MapValues |
-| StringToMap |
-
-## Struct Expressions
-
-| Expression |
-| -------------------- |
-| CreateNamedStruct |
-| GetArrayStructFields |
-| GetStructField |
-| JsonToStructs |
-| StructsToJson |
-
-## URL Functions
-
-| Expression |
-| ------------ |
-| TryUrlDecode |
-| UrlDecode |
-| UrlEncode |
-
-## Conversion Expressions
-
-| Expression |
-| ---------- |
-| Cast |
-
-## SortOrder
-
-| Expression |
-| ---------- |
-| NullsFirst |
-| NullsLast |
-| Ascending |
-| Descending |
-
-## Other
-
-| Expression |
-| ---------------------------- |
-| Alias |
-| AttributeReference |
-| BloomFilterMightContain |
-| Coalesce |
-| CheckOverflow |
-| KnownFloatingPointNormalized |
-| Literal |
-| MakeDecimal |
-| MonotonicallyIncreasingID |
-| NormalizeNaNAndZero |
-| PromotePrecision |
-| RegExpReplace |
-| ScalarSubquery |
-| SparkPartitionID |
-| ToPrettyString |
-| UnscaledValue |
-
-[Comet Configuration Guide]: configs.md
-[Comet Compatibility Guide]: compatibility/expressions/index.md
+# Spark Expression Support
+
+This page is the complete reference for how Apache Comet handles each Spark
built-in
+expression. Comet accelerates expressions either with a native (Rust)
implementation or by
+dispatching to a Spark-compatible codegen path. When an expression is not
supported, Comet
+transparently falls back to Spark for that part of the plan; results are
unaffected.
+
+Expressions marked ✅ Supported are enabled by default. Expressions marked ⚠️
Supported
+(caveats) include cases that are known to diverge from Spark; those cases fall
back to Spark
+by default and must be opted into per expression with
+`spark.comet.expression.EXPRNAME.allowIncompatible=true` (where `EXPRNAME` is
the Spark
+expression class name, for example `Cast`). There is no global opt-in.
+
+Most expressions can also be disabled with
`spark.comet.expression.EXPRNAME.enabled=false`, where
+`EXPRNAME` is the Spark expression class name (for example `Length` or
`StartsWith`). See the
+[Comet Configuration Guide](configs.md) for the full list.
+
+## Status legend
+
+| Status | Meaning
|
+| ---------------------- |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
+| ✅ Supported | Native or codegen path; compatible with Spark by
default.
|
+| ⚠️ Supported (caveats) | Works, but may diverge from Spark in some cases:
incompatible, flag-gated (`allowIncompatible`), or restricted to certain types.
See the [Compatibility Guide](compatibility/index.md). |
+| 🔜 Planned | Intended; tracked by an open issue or pull request.
|
+| 🚫 Out of scope | Deliberately not planned.
|
+
+## Out of scope
+
+Comet focuses acceleration on mainstream relational, string, datetime, math,
and collection
+expressions. Some Spark function families are **out of scope**: specialized
functionality with
+narrow real-world analytics use and high implementation cost. These will fall
back to Spark and
+are not on the roadmap:
+
+- **Probabilistic sketches and approximate top-k** (`kll_sketch_*`, `hll_*`,
`theta_*`, `count_min_sketch`, `bitmap_*`, `approx_top_k*`): specialized data
structures with exact-correctness traps.
Review Comment:
do you think we can implement these natively with 100% compatibility? I
haven't looked into this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]