andygrove opened a new issue, #3171:
URL: https://github.com/apache/datafusion-comet/issues/3171
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `map_sort` function, causing
queries using this function to fall back to Spark's JVM execution instead of
running natively on DataFusion.
The `MapSort` expression sorts a map by its keys in ascending order. It
takes a map as input and returns a new map with the same key-value pairs, but
ordered by the natural ordering of the keys. This expression is
null-intolerant, meaning it will return null if the input map is null.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
map_sort(map_expr)
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| base | MapType | The input map expression to be sorted by its keys |
**Return Type:** Returns the same `MapType` as the input, with identical key
and value types but with entries sorted by key.
**Supported Data Types:**
The input must be a `MapType` where the key type supports ordering.
Supported key types include:
- Numeric types (IntegerType, LongType, FloatType, DoubleType, etc.)
- StringType
- DateType
- TimestampType
- Any other types where `RowOrdering.isOrderable()` returns true
**Edge Cases:**
- **Null handling**: Returns null if the input map is null (null-intolerant
behavior)
- **Empty maps**: Returns an empty map of the same type
- **Duplicate keys**: Maintains existing behavior since maps cannot have
duplicate keys by definition
- **Non-orderable keys**: Throws `DataTypeMismatch` error with
`INVALID_ORDERING_TYPE` subclass
- **Wrong input type**: Throws `DataTypeMismatch` error with
`UNEXPECTED_INPUT_TYPE` subclass for non-map inputs
**Examples:**
```sql
-- Sort a map by its keys
SELECT map_sort(map(3, 'c', 1, 'a', 2, 'b')) AS sorted_map;
-- Result: {1 -> 'a', 2 -> 'b', 3 -> 'c'}
-- Sort a string-keyed map
SELECT map_sort(map('zebra', 1, 'apple', 2, 'banana', 3)) AS sorted_map;
-- Result: {'apple' -> 2, 'banana' -> 3, 'zebra' -> 1}
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(map_sort(col("map_column")))
// Creating and sorting a map
val df = spark.range(1).select(
map_sort(map(lit(3), lit("c"), lit(1), lit("a"), lit(2), lit("b")))
)
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.MapSort`
**Related:**
- `map_keys()` - Extract keys from a map
- `map_values()` - Extract values from a map
- `map_from_entries()` - Create map from array of structs
- `sort_array()` - Sort arrays by element value
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]