andygrove opened a new issue, #3169:
URL: https://github.com/apache/datafusion-comet/issues/3169
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `map_from_entries` function,
causing queries using this function to fall back to Spark's JVM execution
instead of running natively on DataFusion.
The `MapFromEntries` expression converts an array of struct entries into a
map. Each struct in the input array must contain exactly two fields, where the
first field becomes the key and the second field becomes the value in the
resulting map.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
map_from_entries(array_of_structs)
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| child | Expression | An array of struct expressions where each struct has
exactly two fields |
**Return Type:** Returns a `MapType` where the key type corresponds to the
first field type of the input struct and the value type corresponds to the
second field type.
**Supported Data Types:**
- Input: Array of struct types with exactly two fields
- The struct fields can be of any supported Spark SQL data type
- Key types must be orderable/hashable types (cannot be complex types like
arrays, maps, or structs)
- Value types can be any Spark SQL data type
**Edge Cases:**
- Null handling: If the input array is null, returns null
- Empty array: Returns an empty map
- Null struct elements: Null entries in the array are skipped
- Duplicate keys: Later entries with the same key will overwrite earlier
entries
- Struct validation: Runtime error if structs don't have exactly two fields
**Examples:**
```sql
-- Convert array of structs to map
SELECT map_from_entries(array(struct(1, 'a'), struct(2, 'b')));
-- Result: {1:"a", 2:"b"}
-- Using with named struct fields
SELECT map_from_entries(array(struct(1 as id, 'Alice' as name), struct(2 as
id, 'Bob' as name)));
-- Result: {1:"Alice", 2:"Bob"}
-- Empty array case
SELECT map_from_entries(array());
-- Result: {}
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(map_from_entries(
array(
struct(lit(1), lit("a")),
struct(lit(2), lit("b"))
)
))
// Using existing array column
df.select(map_from_entries(col("struct_array")))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.MapFromEntries`
**Related:**
- `map_entries()` - Reverse operation that converts map to array of structs
- `map()` - Direct map construction from alternating key-value arguments
- `struct()` - Creates struct expressions used as input to this function
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]