This is an automated email from the ASF dual-hosted git repository.
mbutrovich pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new 84df1ce61 docs: recommend SQL file tests for new expressions (#3598)
84df1ce61 is described below
commit 84df1ce61df409243c89d65d1aeb347234b5bc21
Author: Andy Grove <[email protected]>
AuthorDate: Wed Feb 25 07:44:57 2026 -0700
docs: recommend SQL file tests for new expressions (#3598)
Update the "Adding Spark-side Tests" section in the contributor guide
to recommend the SQL file test framework as the preferred way to add
test coverage for new expressions, with a link to the full SQL file
tests documentation. The Scala test approach is preserved as an
alternative for cases requiring programmatic setup.
Co-authored-by: Claude Opus 4.6 <[email protected]>
---
.../contributor-guide/adding_a_new_expression.md | 68 +++++++++++++++++-----
1 file changed, 53 insertions(+), 15 deletions(-)
diff --git a/docs/source/contributor-guide/adding_a_new_expression.md
b/docs/source/contributor-guide/adding_a_new_expression.md
index 7853c126b..e989b7636 100644
--- a/docs/source/contributor-guide/adding_a_new_expression.md
+++ b/docs/source/contributor-guide/adding_a_new_expression.md
@@ -210,9 +210,59 @@ Any notes provided will be logged to help with debugging
and understanding why a
#### Adding Spark-side Tests for the New Expression
-It is important to verify that the new expression is correctly recognized by
the native execution engine and matches the expected spark behavior. To do
this, you can add a set of test cases in the `CometExpressionSuite`, and use
the `checkSparkAnswerAndOperator` method to compare the results of the new
expression with the expected Spark results and that Comet's native execution
engine is able to execute the expression.
+It is important to verify that the new expression is correctly recognized by
the native execution engine and matches the expected Spark behavior. The
preferred way to add test coverage is to write a SQL test file using the SQL
file test framework. This approach is simpler than writing Scala test code and
makes it easy to cover many input combinations and edge cases.
+
+##### Writing a SQL test file
+
+Create a `.sql` file under the appropriate subdirectory in
`spark/src/test/resources/sql-tests/expressions/` (e.g., `string/`, `math/`,
`array/`). The file should create a table with test data, then run queries that
exercise the expression. Here is an example for the `unhex` expression:
+
+```sql
+-- ConfigMatrix: parquet.enable.dictionary=false,true
+
+statement
+CREATE TABLE test_unhex(col string) USING parquet
+
+statement
+INSERT INTO test_unhex VALUES
+ ('537061726B2053514C'),
+ ('737472696E67'),
+ ('\0'),
+ (''),
+ ('###'),
+ ('G123'),
+ ('hello'),
+ ('A1B'),
+ ('0A1B'),
+ (NULL)
+
+-- column argument
+query
+SELECT unhex(col) FROM test_unhex
+
+-- literal arguments
+query
+SELECT unhex('48656C6C6F'), unhex(''), unhex(NULL)
+```
+
+Each `query` block automatically runs the SQL through both Spark and Comet and
compares results, and also verifies that Comet executes the expression natively
(not falling back to Spark).
+
+Run the test with:
+
+```shell
+./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite unhex" -Dtest=none
+```
+
+For full documentation on the test file format — including directives like
`ConfigMatrix`, query modes like `spark_answer_only` and `tolerance`, handling
known bugs with `ignore(...)`, and tips for writing thorough tests — see the
[SQL File Tests](sql-file-tests.md) guide.
+
+##### Tips
-For example, this is the test case for the `unhex` expression:
+- **Cover both column references and literals.** Comet often uses different
code paths for each. The SQL file test suite automatically disables constant
folding, so all-literal queries are evaluated natively.
+- **Include edge cases** such as `NULL`, empty strings, boundary values,
`NaN`, and multibyte UTF-8 characters.
+- **Keep one file per expression** to make failures easy to locate.
+
+##### Scala tests (alternative)
+
+For cases that require programmatic setup or custom assertions beyond what SQL
files support, you can also add Scala test cases in `CometExpressionSuite`
using the `checkSparkAnswerAndOperator` method:
```scala
test("unhex") {
@@ -236,11 +286,7 @@ test("unhex") {
}
```
-#### Testing with Literal Values
-
-When writing tests that use literal values (e.g., `SELECT
my_func('literal')`), Spark's constant folding optimizer may evaluate the
expression at planning time rather than execution time. This means your Comet
implementation might not actually be exercised during the test.
-
-To ensure literal expressions are executed by Comet, disable the constant
folding optimizer:
+When writing Scala tests with literal values (e.g., `SELECT
my_func('literal')`), Spark's constant folding optimizer may evaluate the
expression at planning time, bypassing Comet. To prevent this, disable constant
folding:
```scala
test("my_func with literals") {
@@ -251,14 +297,6 @@ test("my_func with literals") {
}
```
-This is particularly important for:
-
-- Edge case tests using specific literal values (e.g., null handling, overflow
conditions)
-- Tests verifying behavior with special input values
-- Any test where the expression inputs are entirely literal
-
-When possible, prefer testing with column references from tables (as shown in
the `unhex` example above), which naturally avoids the constant folding issue.
-
### Adding the Expression To the Protobuf Definition
Once you have the expression implemented in Scala, you might need to update
the protobuf definition to include the new expression. You may not need to do
this if the expression is already covered by the existing protobuf definition
(e.g. you're adding a new scalar function that uses the `ScalarFunc` message).
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]