andygrove opened a new issue, #3169:
URL: https://github.com/apache/datafusion-comet/issues/3169

   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `map_from_entries` function, 
causing queries using this function to fall back to Spark's JVM execution 
instead of running natively on DataFusion.
   
   The `MapFromEntries` expression converts an array of struct entries into a 
map. Each struct in the input array must contain exactly two fields, where the 
first field becomes the key and the second field becomes the value in the 
resulting map.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   map_from_entries(array_of_structs)
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | child | Expression | An array of struct expressions where each struct has 
exactly two fields |
   
   **Return Type:** Returns a `MapType` where the key type corresponds to the 
first field type of the input struct and the value type corresponds to the 
second field type.
   
   **Supported Data Types:**
   - Input: Array of struct types with exactly two fields
   - The struct fields can be of any supported Spark SQL data type
   - Key types must be orderable/hashable types (cannot be complex types like 
arrays, maps, or structs)
   - Value types can be any Spark SQL data type
   
   **Edge Cases:**
   - Null handling: If the input array is null, returns null
   - Empty array: Returns an empty map
   - Null struct elements: Null entries in the array are skipped
   - Duplicate keys: Later entries with the same key will overwrite earlier 
entries
   - Struct validation: Runtime error if structs don't have exactly two fields
   
   **Examples:**
   ```sql
   -- Convert array of structs to map
   SELECT map_from_entries(array(struct(1, 'a'), struct(2, 'b')));
   -- Result: {1:"a", 2:"b"}
   
   -- Using with named struct fields
   SELECT map_from_entries(array(struct(1 as id, 'Alice' as name), struct(2 as 
id, 'Bob' as name)));
   -- Result: {1:"Alice", 2:"Bob"}
   
   -- Empty array case
   SELECT map_from_entries(array());
   -- Result: {}
   ```
   
   ```scala
   // DataFrame API usage
   import org.apache.spark.sql.functions._
   
   df.select(map_from_entries(
     array(
       struct(lit(1), lit("a")),
       struct(lit(2), lit("b"))
     )
   ))
   
   // Using existing array column
   df.select(map_from_entries(col("struct_array")))
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Medium
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.MapFromEntries`
   
   **Related:**
   - `map_entries()` - Reverse operation that converts map to array of structs
   - `map()` - Direct map construction from alternating key-value arguments
   - `struct()` - Creates struct expressions used as input to this function
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to