zhanglistar opened a new issue, #11488:
URL: https://github.com/apache/incubator-gluten/issues/11488
### Backend
VL (Velox)
### Bug description
## Description
`aggregate` (mapped to sparkArrayFold) fails when lambda argument types
(accumulator/element) do not exactly match the runtime-captured types,
especially with nullable arrays or nullable struct fields. This leads to
runtime exceptions or incorrect null results.
## Steps to Reproduce
1. Create a table with array<struct<...>> including null elements/fields.
2. Run aggregate with merge + finish lambda, e.g.:
- merge lambda uses struct accumulator
- finish lambda reads accumulator fields
## Expected Behavior
Results match vanilla Spark.
## Actual Behavior
Runtime error (type mismatch in lambda capture) or incorrect null outputs.
## Proposed Fix
- Align accumulator/element types with merge lambda argument types.
- Cast array elements to the expected lambda element type when needed.
- Add tests for nested struct + nulls.
## Steps to Reproduce
-- setup
CREATE TABLE tb_array_complex(items ARRAY<STRUCT<v:INT, w:DOUBLE>>) USING
parquet;
INSERT INTO tb_array_complex VALUES
(array(named_struct('v', 1, 'w', 1.5), named_struct('v', null, 'w', 2.0),
null)),
(array()),
(null),
(array(named_struct('v', 2, 'w', null), named_struct('v', 3, 'w', 4.5)));
-- reproduce
SELECT
aggregate(
items,
cast(struct(0 as cnt, 0.0 as sum) as struct<cnt:int, sum:double>),
(acc, x) -> struct(
acc.cnt + if(x is null or x.v is null, 0, 1),
acc.sum + coalesce(x.w, 0.0)
),
acc -> if(acc.cnt = 0, cast(null as double), acc.sum / acc.cnt)
) AS avg_w
FROM tb_array_complex;
-- setup
CREATE TABLE tb_array_simple(ids ARRAY<INT>) USING parquet;
INSERT INTO tb_array_simple VALUES (array(1,5,2,null,3)), (array(1,1,3,2)),
(null), (array());
-- reproduce
SELECT
aggregate(
ids,
cast(struct(0 as cnt, 0.0 as sum) as struct<cnt:int, sum:double>),
(acc, x) -> struct(acc.cnt + 1, acc.sum + coalesce(cast(x as double),
0.0)),
acc -> acc.sum
) AS sum_v
FROM tb_array_simple;
### Gluten version
main branch
### Spark version
Spark-3.3.x
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]