[I] [Feature] Support Spark expression: array_exists [datafusion-comet]

via GitHub Wed, 14 Jan 2026 16:59:00 -0800


andygrove opened a new issue, #3149:
URL: https://github.com/apache/datafusion-comet/issues/3149


   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `array_exists` function, causing 
queries using this function to fall back to Spark's JVM execution instead of 
running natively on DataFusion.
   
   The `ArrayExists` expression checks whether any element in an array 
satisfies a given predicate function. It applies a lambda function to each 
element of the array and returns true if at least one element makes the 
predicate evaluate to true.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   EXISTS(array, lambda_function)
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | `argument` | ArrayType | The input array to evaluate |
   | `function` | LambdaFunction | The predicate function to apply to each 
array element |
   | `followThreeValuedLogic` | Boolean | Controls null handling behavior 
(internal parameter) |
   
   **Return Type:** `BooleanType` - Returns true if any element satisfies the 
predicate, false if none do, or null in certain null-handling scenarios.
   
   **Supported Data Types:**
   - Input: Any `ArrayType` with elements of any data type
   - Lambda function must return a boolean result
   - Supports arrays with nullable elements
   
   **Edge Cases:**
   - **Null array input**: Returns null if the input array itself is null
   - **Empty array**: Returns false for empty arrays
   - **Null lambda results**: When `followThreeValuedLogic` is true and lambda 
returns null for some elements but no element returns true, the overall result 
is null
   - **Legacy mode**: When `followThreeValuedLogic` is false, null lambda 
results are ignored and only affect final result if no true value is found
   - **Nullable elements**: Properly handles null elements within the array by 
passing them to the lambda function
   
   **Examples:**
   ```sql
   -- Check if any element is null
   SELECT EXISTS(array(1, 2, 3), x -> x IS NULL);
   -- Returns: false
   
   -- Check if any element is greater than 2
   SELECT EXISTS(array(1, 2, 3), x -> x > 2);
   -- Returns: true
   
   -- Check with null elements
   SELECT EXISTS(array(1, null, 3), x -> x IS NULL);
   -- Returns: true
   ```
   
   ```scala
   // DataFrame API usage
   import org.apache.spark.sql.functions._
   
   df.select(exists(col("array_column"), x => x > lit(10)))
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Medium
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.ArrayExists`
   
   **Related:**
   - `ArrayForAll` - Checks if all elements satisfy a predicate
   - `ArrayFilter` - Filters array elements based on a predicate
   - `ArrayTransform` - Transforms array elements using a lambda function
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Feature] Support Spark expression: array_exists [datafusion-comet]

Reply via email to