andygrove opened a new issue, #3159:
URL: https://github.com/apache/datafusion-comet/issues/3159
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `sort_array` function, causing
queries using this function to fall back to Spark's JVM execution instead of
running natively on DataFusion.
The `SortArray` expression sorts the elements of an array in either
ascending or descending order. It returns a new array with the same elements
sorted according to the specified ordering, with null values handled according
to a consistent null-first or null-last policy.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
sort_array(array[, ascendingOrder])
```
```scala
sort_array(col("array_column"))
sort_array(col("array_column"), lit(false)) // descending order
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| `base` | ArrayType | The input array to be sorted |
| `ascendingOrder` | BooleanType | Optional. If true (default), sorts in
ascending order; if false, sorts in descending order. Must be a foldable
expression (constant). |
**Return Type:** Returns an `ArrayType` with the same element type and
nullability as the input array.
**Supported Data Types:**
Supports arrays containing any orderable data types:
- Numeric types (IntegerType, LongType, DoubleType, FloatType, etc.)
- String types (StringType)
- Date and timestamp types
- Binary types
- Does NOT support arrays containing non-orderable types like MapType or
complex nested structures
**Edge Cases:**
- **Null arrays**: Returns null if the input array is null
- **Empty arrays**: Returns an empty array of the same type
- **Arrays with all nulls**: Returns an array with all nulls in the same
positions (nulls are equal in comparison)
- **Mixed null and non-null elements**: Nulls are consistently placed at the
beginning (ascending) or end (descending)
- **Non-foldable ascendingOrder**: Throws DataTypeMismatch error - the
ordering parameter must be a constant
- **Non-orderable element types**: Throws DataTypeMismatch error during type
checking
**Examples:**
```sql
-- Basic ascending sort (default)
SELECT sort_array(array(3, 1, 4, 1, 5)) AS sorted;
-- Result: [1, 1, 3, 4, 5]
-- Descending sort
SELECT sort_array(array('d', 'c', 'b', 'a', null), false) AS sorted_desc;
-- Result: ['d', 'c', 'b', 'a', null]
-- With null values (ascending)
SELECT sort_array(array(3, null, 1, null, 2)) AS sorted_with_nulls;
-- Result: [null, null, 1, 2, 3]
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(sort_array(col("numbers"))).show()
df.select(sort_array(col("strings"), lit(false))).show()
// With column reference for array
df.withColumn("sorted_values", sort_array(col("value_array"))).show()
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.SortArray`
**Related:**
- `array_sort` - Alternative function name in some Spark versions
- `array_max`, `array_min` - For finding extremes without full sorting
- `shuffle` - For randomizing array element order
- `reverse` - For reversing array element order
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]