andygrove opened a new issue, #3164:
URL: https://github.com/apache/datafusion-comet/issues/3164
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `map_contains_key` function,
causing queries using this function to fall back to Spark's JVM execution
instead of running natively on DataFusion.
The `MapContainsKey` expression checks whether a given key exists in a map.
It is implemented as a runtime-replaceable expression that internally uses
`ArrayContains` on the map's keys to perform the lookup.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
map_contains_key(map_expr, key_expr)
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| map_expr | MapType | The map to search in |
| key_expr | Same as map key type or compatible | The key to search for |
**Return Type:** Returns `BooleanType` - `true` if the key exists in the
map, `false` otherwise.
**Supported Data Types:**
- Map input: Any `MapType` with orderable key types
- Key input: Must be the same type as the map's key type or a type that can
be coerced to it through type widening
- Null key inputs are not supported and will result in a type check error
**Edge Cases:**
- **Null key handling**: Null keys are explicitly rejected during type
checking and will cause a `DataTypeMismatch` error with `NULL_TYPE` subclass
- **Type mismatch**: If key type cannot be coerced to map key type, throws
`DataTypeMismatch` with `MAP_FUNCTION_DIFF_TYPES` subclass
- **Non-orderable keys**: Key types that don't support ordering operations
will fail type validation
- **Empty maps**: Returns `false` for any key lookup in empty maps
- **Null maps**: Standard null propagation rules apply - null map input
returns null result
**Examples:**
```sql
-- Basic usage
SELECT map_contains_key(map(1, 'a', 2, 'b'), 1);
-- Returns: true
SELECT map_contains_key(map(1, 'a', 2, 'b'), 3);
-- Returns: false
-- With string keys
SELECT map_contains_key(map('name', 'John', 'age', '30'), 'name');
-- Returns: true
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(map_contains_key(col("my_map"), lit("search_key")))
// Creating map and checking key
df.select(map_contains_key(
map(lit("key1"), lit("value1"), lit("key2"), lit("value2")),
lit("key1")
))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.MapContainsKey`
**Related:**
- `MapKeys` - Extracts all keys from a map
- `ArrayContains` - Underlying implementation for containment check
- `MapValues` - Extracts all values from a map
- `ElementAt` - Retrieves value by key from map
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]