andygrove opened a new issue, #3150:
URL: https://github.com/apache/datafusion-comet/issues/3150
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `create_array` function, causing
queries using this function to fall back to Spark's JVM execution instead of
running natively on DataFusion.
The `CreateArray` expression creates an array from a sequence of individual
elements. It takes multiple input expressions as children and combines them
into a single array value, handling type coercion to ensure all elements have a
compatible common type.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
array(expr1, expr2, ..., exprN)
```
```scala
// DataFrame API
import org.apache.spark.sql.functions.array
array(col("col1"), col("col2"), lit(3))
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| children | Seq[Expression] | Variable number of expressions that will
become array elements |
| useStringTypeWhenEmpty | Boolean | Flag determining default element type
for empty arrays (StringType if true, NullType if false) |
**Return Type:** `ArrayType` with element type determined by finding the
common type among all input expressions. The array is marked as containing
nulls if any input expression is nullable.
**Supported Data Types:**
- All primitive types (numeric, string, boolean, binary, date, timestamp)
- Complex types (arrays, maps, structs)
- All input expressions must have compatible types that can be coerced to a
common type
**Edge Cases:**
- **Empty arrays**: Uses `defaultElementType` (StringType or NullType) based
on `useStringTypeWhenEmpty` flag
- **Null elements**: Individual null values are preserved as array elements;
the array itself is never null
- **Type coercion**: Automatically promotes compatible types (e.g., int and
long become long array)
- **Mixed nullability**: Array is marked as containing nulls if any input
expression is nullable, even if actual values are non-null
**Examples:**
```sql
-- Basic array creation
SELECT array(1, 2, 3);
-- Result: [1, 2, 3]
-- Mixed compatible types (promoted to double)
SELECT array(1, 2.5, 3);
-- Result: [1.0, 2.5, 3.0]
-- String array
SELECT array('a', 'b', 'c');
-- Result: ["a", "b", "c"]
-- Array with nulls
SELECT array(1, null, 3);
-- Result: [1, null, 3]
-- Empty array
SELECT array();
-- Result: []
```
```scala
// DataFrame API examples
import org.apache.spark.sql.functions._
// Create array from column values
df.select(array(col("col1"), col("col2"), col("col3")))
// Mix columns and literals
df.select(array(col("name"), lit("constant"), col("other")))
// Nested in other operations
df.select(array(col("a"), col("b")).alias("arr"))
.filter(size(col("arr")) > 0)
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.CreateArray`
**Related:**
- `CreateMap` - Creates map structures from key-value pairs
- `CreateNamedStruct` - Creates struct objects from field name-value pairs
- `GetArrayItem` - Extracts elements from arrays by index
- `ArrayContains` - Checks if array contains specific values
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]