jiangjiangtian opened a new issue, #6630:
URL: https://github.com/apache/incubator-gluten/issues/6630

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   I have a sql query that runs in gluten and vanilla spark, its format is as 
follows:
   ```SQL
   select count(*) from ((
       select *
       from test1
       where xxx
     )a
     left join
     (
       select col_a, col_b, col_c, col_d, col_e
       from test2
       where xxx
       group by col_a
               ,col_b
               ,col_c
               ,col_d
               ,col_e
     )b
   ON a.col1 = b.col1);
   ```
   I get different number of rows. And I look at the spark ui, I found the 
reason is that the numbers of rows of the second subquery don't match.
   vanilla spark:
   
![image](https://github.com/user-attachments/assets/63069d6f-4702-42e7-a59a-222bb073bdd7)
   
   gluten:
   <img width="466" alt="image" 
src="https://github.com/user-attachments/assets/93720b7a-5f98-41c6-94b9-5418d56131e1";>
   <img width="371" alt="image" 
src="https://github.com/user-attachments/assets/f6b887cc-bb94-4118-9b0e-7caf2e3c1fa7";>
   
   Actually, I found that some rows are duplicate.
   But when I just run the second subquery, I get the right result.
   <img width="489" alt="image" 
src="https://github.com/user-attachments/assets/38d0406d-12e5-4288-a3e8-25ca5f750d27";>
   <img width="438" alt="image" 
src="https://github.com/user-attachments/assets/90721893-2773-44d3-8fe2-de1faa1a78fe";>
   We can see the plan is different. The second hash aggregation is regular.
   So I think there might be a bug for flushable hash aggregation or the plan 
conversion, but I can't find a small SQL to demonstrate the bug.
   I'm sorry for not having a small example.
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   Velox System Info v0.0.2
   Commit: 96712646c63bf4305cca4eaa7dfd26c2179547b1
   CMake Version: 3.17.5
   System: Linux-3.10.0-862.mt20190308.130.el7.x86_64
   Arch: x86_64
   CPU Name: Model name:            Intel(R) Xeon(R) Platinum 8255C CPU @ 
2.50GHz
   C++ Compiler: /opt/rh/devtoolset-10/root/usr/bin/c++
   C++ Compiler Version: 10.2.1
   C Compiler: /opt/rh/devtoolset-10/root/usr/bin/cc
   C Compiler Version: 10.2.1
   CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to