(datafusion-comet) branch main updated: docs: recommend SQL file tests for new expressions (#3598)

mbutrovich Wed, 25 Feb 2026 06:45:11 -0800

This is an automated email from the ASF dual-hosted git repository.

mbutrovich pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/main by this push:
     new 84df1ce61 docs: recommend SQL file tests for new expressions (#3598)
84df1ce61 is described below

commit 84df1ce61df409243c89d65d1aeb347234b5bc21
Author: Andy Grove <[email protected]>
AuthorDate: Wed Feb 25 07:44:57 2026 -0700

    docs: recommend SQL file tests for new expressions (#3598)
    
    Update the "Adding Spark-side Tests" section in the contributor guide
    to recommend the SQL file test framework as the preferred way to add
    test coverage for new expressions, with a link to the full SQL file
    tests documentation. The Scala test approach is preserved as an
    alternative for cases requiring programmatic setup.
    
    Co-authored-by: Claude Opus 4.6 <[email protected]>
---
 .../contributor-guide/adding_a_new_expression.md   | 68 +++++++++++++++++-----
 1 file changed, 53 insertions(+), 15 deletions(-)

diff --git a/docs/source/contributor-guide/adding_a_new_expression.md 
b/docs/source/contributor-guide/adding_a_new_expression.md
index 7853c126b..e989b7636 100644
--- a/docs/source/contributor-guide/adding_a_new_expression.md
+++ b/docs/source/contributor-guide/adding_a_new_expression.md
@@ -210,9 +210,59 @@ Any notes provided will be logged to help with debugging 
and understanding why a
 
 #### Adding Spark-side Tests for the New Expression
 
-It is important to verify that the new expression is correctly recognized by 
the native execution engine and matches the expected spark behavior. To do 
this, you can add a set of test cases in the `CometExpressionSuite`, and use 
the `checkSparkAnswerAndOperator` method to compare the results of the new 
expression with the expected Spark results and that Comet's native execution 
engine is able to execute the expression.
+It is important to verify that the new expression is correctly recognized by 
the native execution engine and matches the expected Spark behavior. The 
preferred way to add test coverage is to write a SQL test file using the SQL 
file test framework. This approach is simpler than writing Scala test code and 
makes it easy to cover many input combinations and edge cases.
+
+##### Writing a SQL test file
+
+Create a `.sql` file under the appropriate subdirectory in 
`spark/src/test/resources/sql-tests/expressions/` (e.g., `string/`, `math/`, 
`array/`). The file should create a table with test data, then run queries that 
exercise the expression. Here is an example for the `unhex` expression:
+
+```sql
+-- ConfigMatrix: parquet.enable.dictionary=false,true
+
+statement
+CREATE TABLE test_unhex(col string) USING parquet
+
+statement
+INSERT INTO test_unhex VALUES
+  ('537061726B2053514C'),
+  ('737472696E67'),
+  ('\0'),
+  (''),
+  ('###'),
+  ('G123'),
+  ('hello'),
+  ('A1B'),
+  ('0A1B'),
+  (NULL)
+
+-- column argument
+query
+SELECT unhex(col) FROM test_unhex
+
+-- literal arguments
+query
+SELECT unhex('48656C6C6F'), unhex(''), unhex(NULL)
+```
+
+Each `query` block automatically runs the SQL through both Spark and Comet and 
compares results, and also verifies that Comet executes the expression natively 
(not falling back to Spark).
+
+Run the test with:
+
+```shell
+./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite unhex" -Dtest=none
+```
+
+For full documentation on the test file format — including directives like 
`ConfigMatrix`, query modes like `spark_answer_only` and `tolerance`, handling 
known bugs with `ignore(...)`, and tips for writing thorough tests — see the 
[SQL File Tests](sql-file-tests.md) guide.
+
+##### Tips
 
-For example, this is the test case for the `unhex` expression:
+- **Cover both column references and literals.** Comet often uses different 
code paths for each. The SQL file test suite automatically disables constant 
folding, so all-literal queries are evaluated natively.
+- **Include edge cases** such as `NULL`, empty strings, boundary values, 
`NaN`, and multibyte UTF-8 characters.
+- **Keep one file per expression** to make failures easy to locate.
+
+##### Scala tests (alternative)
+
+For cases that require programmatic setup or custom assertions beyond what SQL 
files support, you can also add Scala test cases in `CometExpressionSuite` 
using the `checkSparkAnswerAndOperator` method:
 
 ```scala
 test("unhex") {
@@ -236,11 +286,7 @@ test("unhex") {
 }
 ```
 
-#### Testing with Literal Values
-
-When writing tests that use literal values (e.g., `SELECT 
my_func('literal')`), Spark's constant folding optimizer may evaluate the 
expression at planning time rather than execution time. This means your Comet 
implementation might not actually be exercised during the test.
-
-To ensure literal expressions are executed by Comet, disable the constant 
folding optimizer:
+When writing Scala tests with literal values (e.g., `SELECT 
my_func('literal')`), Spark's constant folding optimizer may evaluate the 
expression at planning time, bypassing Comet. To prevent this, disable constant 
folding:
 
 ```scala
 test("my_func with literals") {
@@ -251,14 +297,6 @@ test("my_func with literals") {
 }
 ```
 
-This is particularly important for:
-
-- Edge case tests using specific literal values (e.g., null handling, overflow 
conditions)
-- Tests verifying behavior with special input values
-- Any test where the expression inputs are entirely literal
-
-When possible, prefer testing with column references from tables (as shown in 
the `unhex` example above), which naturally avoids the constant folding issue.
-
 ### Adding the Expression To the Protobuf Definition
 
 Once you have the expression implemented in Scala, you might need to update 
the protobuf definition to include the new expression. You may not need to do 
this if the expression is already covered by the existing protobuf definition 
(e.g. you're adding a new scalar function that uses the `ScalarFunc` message).


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-comet) branch main updated: docs: recommend SQL file tests for new expressions (#3598)

Reply via email to