andygrove commented on code in PR #2704:
URL: https://github.com/apache/datafusion-comet/pull/2704#discussion_r2504083415
##########
docs/source/contributor-guide/adding_a_new_expression.md:
##########
@@ -41,26 +41,65 @@ Once you know what you want to add, you'll need to update
the query planner to r
### Adding the Expression in Scala
-The `QueryPlanSerde` object has a method `exprToProto`, which is responsible
for converting a Spark expression to a protobuf expression. Within that method
is an `exprToProtoInternal` method that contains a large match statement for
each expression type. You'll need to add a new case to this match statement for
your new expression.
+DataFusion Comet uses a framework based on the `CometExpressionSerde` trait
for converting Spark expressions to protobuf. Instead of a large match
statement, each expression type has its own serialization handler. For
aggregate expressions, use the `CometAggregateExpressionSerde` trait instead.
+
+#### Creating a CometExpressionSerde Implementation
+
+First, create an object that extends `CometExpressionSerde[T]` where `T` is
the Spark expression type. This is typically added to one of the serde files in
`spark/src/main/scala/org/apache/comet/serde/` (e.g., `math.scala`,
`strings.scala`, `arrays.scala`, etc.).
For example, the `unhex` function looks like this:
```scala
-case e: Unhex =>
- val unHex = unhexSerde(e)
+object CometUnhex extends CometExpressionSerde[Unhex] {
+ override def convert(
+ expr: Unhex,
+ inputs: Seq[Attribute],
+ binding: Boolean): Option[ExprOuterClass.Expr] = {
+ val childExpr = exprToProtoInternal(expr.child, inputs, binding)
+ val failOnErrorExpr = exprToProtoInternal(Literal(expr.failOnError),
inputs, binding)
+
+ val optExpr =
+ scalarFunctionExprToProtoWithReturnType(
+ "unhex",
+ expr.dataType,
+ false,
+ childExpr,
+ failOnErrorExpr)
+ optExprWithInfo(optExpr, expr, expr.child)
+ }
+}
+```
+
+The `CometExpressionSerde` trait provides three methods you can override:
+
+* `convert(expr: T, inputs: Seq[Attribute], binding: Boolean): Option[Expr]` -
**Required**. Converts the Spark expression to protobuf. Return `None` if the
expression cannot be converted.
+* `getSupportLevel(expr: T): SupportLevel` - Optional. Returns `Compatible()`,
`Incompatible()`, or `Unsupported()` to indicate the level of support for the
expression. This allows you to handle edge cases or unsupported data types.
+* `getExprConfigName(expr: T): String` - Optional. Returns a short name for
configuration keys. Defaults to the Spark class name.
+
+For simple scalar functions that map directly to a DataFusion function, you
can use the built-in `CometScalarFunction` implementation:
+
+```scala
+classOf[Cos] -> CometScalarFunction("cos")
+```
+
+#### Registering the Expression Handler
- val childExpr = exprToProtoInternal(unHex._1, inputs)
- val failOnErrorExpr = exprToProtoInternal(unHex._2, inputs)
+Once you've created your `CometExpressionSerde` implementation, register it in
`QueryPlanSerde.scala` by adding it to the appropriate expression map (e.g.,
`mathExpressions`, `stringExpressions`, `predicateExpressions`, etc.):
- val optExpr =
- scalarExprToProtoWithReturnType("unhex", e.dataType, childExpr,
failOnErrorExpr)
- optExprWithInfo(optExpr, expr, unHex._1)
+```scala
+private val mathExpressions: Map[Class[_ <: Expression],
CometExpressionSerde[_]] = Map(
+ // ... other expressions ...
+ classOf[Unhex] -> CometUnhex,
+ classOf[Hex] -> CometHex)
```
-A few things to note here:
+The `exprToProtoInternal` method will automatically use this mapping to find
and invoke your handler when it encounters the corresponding Spark expression
type.
+
+A few things to note:
-* The function is recursively called on child expressions, so you'll need to
make sure that the child expressions are also converted to protobuf.
-* `scalarExprToProtoWithReturnType` is for scalar functions that need return
type information. Your expression may use a different method depending on the
type of expression.
+* The `convert` method is recursively called on child expressions using
`exprToProtoInternal`, so you'll need to make sure that the child expressions
are also converted to protobuf.
+* `scalarFunctionExprToProtoWithReturnType` is for scalar functions that need
return type information. Your expression may use a different method depending
on the type of expression.
Review Comment:
@mbutrovich I made that change. PTAL when you can
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]