[I] [Feature] Support Spark expression: map_sort [datafusion-comet]

via GitHub Wed, 14 Jan 2026 17:04:07 -0800


andygrove opened a new issue, #3171:
URL: https://github.com/apache/datafusion-comet/issues/3171


   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `map_sort` function, causing 
queries using this function to fall back to Spark's JVM execution instead of 
running natively on DataFusion.
   
   The `MapSort` expression sorts a map by its keys in ascending order. It 
takes a map as input and returns a new map with the same key-value pairs, but 
ordered by the natural ordering of the keys. This expression is 
null-intolerant, meaning it will return null if the input map is null.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   map_sort(map_expr)
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | base | MapType | The input map expression to be sorted by its keys |
   
   **Return Type:** Returns the same `MapType` as the input, with identical key 
and value types but with entries sorted by key.
   
   **Supported Data Types:**
   The input must be a `MapType` where the key type supports ordering. 
Supported key types include:
   
   - Numeric types (IntegerType, LongType, FloatType, DoubleType, etc.)
   - StringType 
   - DateType
   - TimestampType
   - Any other types where `RowOrdering.isOrderable()` returns true
   
   **Edge Cases:**
   - **Null handling**: Returns null if the input map is null (null-intolerant 
behavior)
   - **Empty maps**: Returns an empty map of the same type
   - **Duplicate keys**: Maintains existing behavior since maps cannot have 
duplicate keys by definition
   - **Non-orderable keys**: Throws `DataTypeMismatch` error with 
`INVALID_ORDERING_TYPE` subclass
   - **Wrong input type**: Throws `DataTypeMismatch` error with 
`UNEXPECTED_INPUT_TYPE` subclass for non-map inputs
   
   **Examples:**
   ```sql
   -- Sort a map by its keys
   SELECT map_sort(map(3, 'c', 1, 'a', 2, 'b')) AS sorted_map;
   -- Result: {1 -> 'a', 2 -> 'b', 3 -> 'c'}
   
   -- Sort a string-keyed map
   SELECT map_sort(map('zebra', 1, 'apple', 2, 'banana', 3)) AS sorted_map;
   -- Result: {'apple' -> 2, 'banana' -> 3, 'zebra' -> 1}
   ```
   
   ```scala
   // DataFrame API usage
   import org.apache.spark.sql.functions._
   
   df.select(map_sort(col("map_column")))
   
   // Creating and sorting a map
   val df = spark.range(1).select(
     map_sort(map(lit(3), lit("c"), lit(1), lit("a"), lit(2), lit("b")))
   )
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Medium
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.MapSort`
   
   **Related:**
   - `map_keys()` - Extract keys from a map
   - `map_values()` - Extract values from a map  
   - `map_from_entries()` - Create map from array of structs
   - `sort_array()` - Sort arrays by element value
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Feature] Support Spark expression: map_sort [datafusion-comet]

Reply via email to