jiangjiangtian opened a new issue, #6630:
URL: https://github.com/apache/incubator-gluten/issues/6630
### Backend
VL (Velox)
### Bug description
I have a sql query that runs in gluten and vanilla spark, its format is as
follows:
```SQL
select count(*) from ((
select *
from test1
where xxx
)a
left join
(
select col_a, col_b, col_c, col_d, col_e
from test2
where xxx
group by col_a
,col_b
,col_c
,col_d
,col_e
)b
ON a.col1 = b.col1);
```
I get different number of rows. And I look at the spark ui, I found the
reason is that the numbers of rows of the second subquery don't match.
vanilla spark:

gluten:
<img width="466" alt="image"
src="https://github.com/user-attachments/assets/93720b7a-5f98-41c6-94b9-5418d56131e1">
<img width="371" alt="image"
src="https://github.com/user-attachments/assets/f6b887cc-bb94-4118-9b0e-7caf2e3c1fa7">
Actually, I found that some rows are duplicate.
But when I just run the second subquery, I get the right result.
<img width="489" alt="image"
src="https://github.com/user-attachments/assets/38d0406d-12e5-4288-a3e8-25ca5f750d27">
<img width="438" alt="image"
src="https://github.com/user-attachments/assets/90721893-2773-44d3-8fe2-de1faa1a78fe">
We can see the plan is different. The second hash aggregation is regular.
So I think there might be a bug for flushable hash aggregation or the plan
conversion, but I can't find a small SQL to demonstrate the bug.
I'm sorry for not having a small example.
### Spark version
None
### Spark configurations
_No response_
### System information
Velox System Info v0.0.2
Commit: 96712646c63bf4305cca4eaa7dfd26c2179547b1
CMake Version: 3.17.5
System: Linux-3.10.0-862.mt20190308.130.el7.x86_64
Arch: x86_64
CPU Name: Model name: Intel(R) Xeon(R) Platinum 8255C CPU @
2.50GHz
C++ Compiler: /opt/rh/devtoolset-10/root/usr/bin/c++
C++ Compiler Version: 10.2.1
C Compiler: /opt/rh/devtoolset-10/root/usr/bin/cc
C Compiler Version: 10.2.1
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt
### Relevant logs
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]