zhanglistar opened a new issue, #11488:
URL: https://github.com/apache/incubator-gluten/issues/11488

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   ## Description
   `aggregate` (mapped to sparkArrayFold) fails when lambda argument types 
(accumulator/element) do not exactly match the runtime-captured types, 
especially with nullable arrays or nullable struct fields. This leads to 
runtime exceptions or incorrect null results.
   
   ## Steps to Reproduce
   1. Create a table with array<struct<...>> including null elements/fields.
   2. Run aggregate with merge + finish lambda, e.g.:
      - merge lambda uses struct accumulator
      - finish lambda reads accumulator fields
   
   ## Expected Behavior
   Results match vanilla Spark.
   
   ## Actual Behavior
   Runtime error (type mismatch in lambda capture) or incorrect null outputs.
   
   ## Proposed Fix
   - Align accumulator/element types with merge lambda argument types.
   - Cast array elements to the expected lambda element type when needed.
   - Add tests for nested struct + nulls.
   
   ## Steps to Reproduce
   -- setup
   CREATE TABLE tb_array_complex(items ARRAY<STRUCT<v:INT, w:DOUBLE>>) USING 
parquet;
   
   INSERT INTO tb_array_complex VALUES
     (array(named_struct('v', 1, 'w', 1.5), named_struct('v', null, 'w', 2.0), 
null)),
     (array()),
     (null),
     (array(named_struct('v', 2, 'w', null), named_struct('v', 3, 'w', 4.5)));
   
   -- reproduce
   SELECT
     aggregate(
       items,
       cast(struct(0 as cnt, 0.0 as sum) as struct<cnt:int, sum:double>),
       (acc, x) -> struct(
         acc.cnt + if(x is null or x.v is null, 0, 1),
         acc.sum + coalesce(x.w, 0.0)
       ),
       acc -> if(acc.cnt = 0, cast(null as double), acc.sum / acc.cnt)
     ) AS avg_w
   FROM tb_array_complex;
   
   -- setup
   CREATE TABLE tb_array_simple(ids ARRAY<INT>) USING parquet;
   INSERT INTO tb_array_simple VALUES (array(1,5,2,null,3)), (array(1,1,3,2)), 
(null), (array());
   
   -- reproduce
   SELECT
     aggregate(
       ids,
       cast(struct(0 as cnt, 0.0 as sum) as struct<cnt:int, sum:double>),
       (acc, x) -> struct(acc.cnt + 1, acc.sum + coalesce(cast(x as double), 
0.0)),
       acc -> acc.sum
     ) AS sum_v
   FROM tb_array_simple;
   
   ### Gluten version
   
   main branch
   
   ### Spark version
   
   Spark-3.3.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to