This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new f75aeefa docs: Improve user documentation for supported operators and
expressions (#520)
f75aeefa is described below
commit f75aeefab58dc7e14cb70742b9d7bb656b727dbd
Author: Andy Grove <[email protected]>
AuthorDate: Fri Jun 7 15:15:23 2024 -0600
docs: Improve user documentation for supported operators and expressions
(#520)
* Improve documentation about supported operators and expressions
* Improve documentation about supported operators and expressions
* more notes
* Add more supported expressions
* rename protobuf Negative to UnaryMinus for consistency
* format
* remove duplicate ASF header
* SMJ not disabled by default
* Update docs/source/user-guide/operators.md
Co-authored-by: Liang-Chi Hsieh <[email protected]>
* Update docs/source/user-guide/operators.md
Co-authored-by: Liang-Chi Hsieh <[email protected]>
* remove RLike
---------
Co-authored-by: Liang-Chi Hsieh <[email protected]>
---
core/src/execution/datafusion/planner.rs | 2 +-
core/src/execution/proto/expr.proto | 4 +-
docs/source/user-guide/expressions.md | 267 +++++++++++++--------
docs/source/user-guide/operators.md | 27 ++-
.../org/apache/comet/serde/QueryPlanSerde.scala | 4 +-
5 files changed, 192 insertions(+), 112 deletions(-)
diff --git a/core/src/execution/datafusion/planner.rs
b/core/src/execution/datafusion/planner.rs
index 7d504f87..6c7ea0de 100644
--- a/core/src/execution/datafusion/planner.rs
+++ b/core/src/execution/datafusion/planner.rs
@@ -566,7 +566,7 @@ impl PhysicalPlanner {
let child = self.create_expr(expr.child.as_ref().unwrap(),
input_schema)?;
Ok(Arc::new(NotExpr::new(child)))
}
- ExprStruct::Negative(expr) => {
+ ExprStruct::UnaryMinus(expr) => {
let child: Arc<dyn PhysicalExpr> =
self.create_expr(expr.child.as_ref().unwrap(),
input_schema.clone())?;
let result = negative::create_negate_expr(child,
expr.fail_on_error);
diff --git a/core/src/execution/proto/expr.proto
b/core/src/execution/proto/expr.proto
index 9c604901..5192bbd4 100644
--- a/core/src/execution/proto/expr.proto
+++ b/core/src/execution/proto/expr.proto
@@ -65,7 +65,7 @@ message Expr {
CaseWhen caseWhen = 38;
In in = 39;
Not not = 40;
- Negative negative = 41;
+ UnaryMinus unary_minus = 41;
BitwiseShiftRight bitwiseShiftRight = 42;
BitwiseShiftLeft bitwiseShiftLeft = 43;
IfExpr if = 44;
@@ -452,7 +452,7 @@ message Not {
Expr child = 1;
}
-message Negative {
+message UnaryMinus {
Expr child = 1;
bool fail_on_error = 2;
}
diff --git a/docs/source/user-guide/expressions.md
b/docs/source/user-guide/expressions.md
index 775ebedb..14b6f18d 100644
--- a/docs/source/user-guide/expressions.md
+++ b/docs/source/user-guide/expressions.md
@@ -19,99 +19,174 @@
# Supported Spark Expressions
-The following Spark expressions are currently available:
-
-- Literals
-- Arithmetic Operators
- - UnaryMinus
- - Add/Minus/Multiply/Divide/Remainder
-- Conditional functions
- - Case When
- - If
-- Cast
-- Coalesce
-- BloomFilterMightContain
-- Boolean functions
- - And
- - Or
- - Not
- - EqualTo
- - EqualNullSafe
- - GreaterThan
- - GreaterThanOrEqual
- - LessThan
- - LessThanOrEqual
- - IsNull
- - IsNotNull
- - In
-- String functions
- - Substring
- - Coalesce
- - StringSpace
- - Like
- - Contains
- - Startswith
- - Endswith
- - Ascii
- - Bit_length
- - Octet_length
- - Upper
- - Lower
- - Chr
- - Initcap
- - Trim/Btrim/Ltrim/Rtrim
- - Concat_ws
- - Repeat
- - Length
- - Reverse
- - Instr
- - Replace
- - Translate
-- Bitwise functions
- - Shiftright/Shiftleft
-- Date/Time functions
- - Year/Hour/Minute/Second
-- Hash functions
- - Md5
- - Sha2
- - Hash
- - Xxhash64
-- Math functions
- - Abs
- - Acos
- - Asin
- - Atan
- - Atan2
- - Cos
- - Exp
- - Ln
- - Log10
- - Log2
- - Pow
- - Round
- - Signum
- - Sin
- - Sqrt
- - Tan
- - Ceil
- - Floor
-- Aggregate functions
- - Count
- - Sum
- - Max
- - Min
- - Avg
- - First
- - Last
- - BitAnd
- - BitOr
- - BitXor
- - BoolAnd
- - BoolOr
- - CovPopulation
- - CovSample
- - VariancePop
- - VarianceSamp
- - StddevPop
- - StddevSamp
- - Corr
+The following Spark expressions are currently available. Any known
compatibility issues are noted in the following tables.
+
+## Literal Values
+
+| Expression | Notes |
+| -------------------------------------- | ----- |
+| Literal values of supported data types | |
+
+## Unary Arithmetic
+
+| Expression | Notes |
+| ---------------- | ----- |
+| UnaryMinus (`-`) | |
+
+## Binary Arithmeticx
+
+| Expression | Notes |
+| --------------- | --------------------------------------------------- |
+| Add (`+`) | |
+| Subtract (`-`) | |
+| Multiply (`*`) | |
+| Divide (`/`) | |
+| Remainder (`%`) | Comet produces `NaN` instead of `NULL` for `% -0.0` |
+
+## Conditional Expressions
+
+| Expression | Notes |
+| ---------- | ----- |
+| CaseWhen | |
+| If | |
+
+## Comparison
+
+| Expression | Notes |
+| ------------------------- | ----- |
+| EqualTo (`=`) | |
+| EqualNullSafe (`<=>`) | |
+| GreaterThan (`>`) | |
+| GreaterThanOrEqual (`>=`) | |
+| LessThan (`<`) | |
+| LessThanOrEqual (`<=`) | |
+| IsNull (`IS NULL`) | |
+| IsNotNull (`IS NOT NULL`) | |
+| In (`IN`) | |
+
+## String Functions
+
+| Expression | Notes
|
+| --------------- |
-----------------------------------------------------------------------------------------------------------
|
+| Ascii |
|
+| BitLength |
|
+| Chr |
|
+| ConcatWs |
|
+| Contains |
|
+| EndsWith |
|
+| InitCap |
|
+| Instr |
|
+| Length |
|
+| Like |
|
+| Lower |
|
+| OctetLength |
|
+| Repeat | Negative argument for number of times to repeat causes
exception |
+| Replace |
|
+| Reverse |
|
+| StartsWith |
|
+| StringSpace |
|
+| StringTrim |
|
+| StringTrimBoth |
|
+| StringTrimLeft |
|
+| StringTrimRight |
|
+| Substring |
|
+| Translate |
|
+| Upper |
|
+
+## Date/Time Functions
+
+| Expression | Notes |
+| -------------- | ------------------------ |
+| DatePart | Only `year` is supported |
+| Extract | Only `year` is supported |
+| Hour | |
+| Minute | |
+| Second | |
+| TruncDate | |
+| TruncTimestamp | |
+| Year | |
+
+## Math Expressions
+
+| Expression | Notes
|
+| ---------- |
------------------------------------------------------------------- |
+| Abs |
|
+| Acos |
|
+| Asin |
|
+| Atan |
|
+| Atan2 |
|
+| Ceil |
|
+| Cos |
|
+| Exp |
|
+| Floor |
|
+| Log | log(0) will produce `-Infinity` unlike Spark which returns
`null` |
+| Log2 | log2(0) will produce `-Infinity` unlike Spark which returns
`null` |
+| Log10 | log10(0) will produce `-Infinity` unlike Spark which returns
`null` |
+| Pow |
|
+| Round |
|
+| Signum | Signum does not differentiate between `0.0` and `-0.0`
|
+| Sin |
|
+| Sqrt |
|
+| Tan |
|
+
+## Hashing Functions
+
+| Expression | Notes |
+| ---------- | ----- |
+| Md5 | |
+| Hash | |
+| Sha2 | |
+| XxHash64 | |
+
+## Boolean Expressions
+
+| Expression | Notes |
+| ---------- | ----- |
+| And | |
+| Or | |
+| Not | |
+
+## Bitwise Expressions
+
+| Expression | Notes |
+| -------------------- | ----- |
+| ShiftLeft (`<<`) | |
+| ShiftRight (`>>`) | |
+| BitAnd (`&`) | |
+| BitOr (`\|`) | |
+| BitXor (`^`) | |
+| BitwiseNot (`~`) | |
+| BoolAnd (`bool_and`) | |
+| BoolOr (`bool_or`) | |
+
+## Aggregate Expressions
+
+| Expression | Notes |
+| ------------- | ----- |
+| Avg | |
+| BitAndAgg | |
+| BitOrAgg | |
+| BitXorAgg | |
+| Corr | |
+| Count | |
+| CovPopulation | |
+| CovSample | |
+| First | |
+| Last | |
+| Max | |
+| Min | |
+| StddevPop | |
+| StddevSamp | |
+| Sum | |
+| VariancePop | |
+| VarianceSamp | |
+
+## Other
+
+| Expression | Notes
|
+| ----------------------- |
-------------------------------------------------------------------------------
|
+| Cast | See compatibility guide for list of supported cast
expressions and known issues |
+| BloomFilterMightContain |
|
+| ScalarSubquery |
|
+| Coalesce |
|
+| NormalizeNaNAndZero |
|
diff --git a/docs/source/user-guide/operators.md
b/docs/source/user-guide/operators.md
index ec82e9f6..e3a3ac52 100644
--- a/docs/source/user-guide/operators.md
+++ b/docs/source/user-guide/operators.md
@@ -19,15 +19,20 @@
# Supported Spark Operators
-The following Spark operators are currently available:
+The following Spark operators are currently replaced with native versions.
Query stages that contain any operators
+not supported by Comet will fall back to regular Spark execution.
-- FileSourceScanExec/BatchScanExec for Parquet
-- Projection
-- Filter
-- Sort
-- Hash Aggregate
-- Limit
-- Sort-merge Join
-- Hash Join
-- Shuffle
-- Expand
+| Operator | Notes |
+| -------------------------------------------- | ----- |
+| FileSourceScanExec/BatchScanExec for Parquet | |
+| Projection | |
+| Filter | |
+| Sort | |
+| Hash Aggregate | |
+| Limit | |
+| Sort-merge Join | |
+| Hash Join | |
+| BroadcastHashJoinExec | |
+| Shuffle | |
+| Expand | |
+| Union | |
diff --git a/spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
b/spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
index 8d81b57c..448c4ff0 100644
--- a/spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
+++ b/spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
@@ -1987,13 +1987,13 @@ object QueryPlanSerde extends Logging with
ShimQueryPlanSerde with CometExprShim
case UnaryMinus(child, failOnError) =>
val childExpr = exprToProtoInternal(child, inputs)
if (childExpr.isDefined) {
- val builder = ExprOuterClass.Negative.newBuilder()
+ val builder = ExprOuterClass.UnaryMinus.newBuilder()
builder.setChild(childExpr.get)
builder.setFailOnError(failOnError)
Some(
ExprOuterClass.Expr
.newBuilder()
- .setNegative(builder)
+ .setUnaryMinus(builder)
.build())
} else {
withInfo(expr, child)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]