NEUpanning opened a new issue, #8787:
URL: https://github.com/apache/incubator-gluten/issues/8787

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   We are running into an issue where the probe table only contains a subset of 
rows that meet the build table, but the result of gluten BHJ is equal to the 
number of output rows of probe table. The join condition does not filter any 
rows. Through the velox plan we can see the join condition filter is pushed 
down to table scan and it also does not filter any rows.
   
   vanilla plan:
   
   <img width="595" alt="Image" 
src="https://github.com/user-attachments/assets/163ebf52-62e0-4d72-b9aa-8e79ad6032cd";
 />
   
   gluten plan:
   
   <img width="402" alt="Image" 
src="https://github.com/user-attachments/assets/f8ef76ca-d6c3-4efb-9bb7-1d1545b4a1fb";
 />
   
   velox plan:
   ```
   -- HashJoin[3][LEFT SEMI (FILTER) n0_14=n2_0] -> n0_0:BIGINT, n0_1:VARCHAR, 
n0_2:BIGINT, n0_3:BIGINT, n0_4:VARCHAR, n0_5:INTEGER, n0_6:INTEGER, 
n0_7:BIGINT, n0_8:VARCHAR, n0_9:BIGINT, n0_10:BIGINT, n0_11:BIGINT, 
n0_12:BIGINT, n0_13:INTEGER, n0_14:BIGINT, n0_15:BIGINT, n0_16:INTEGER, 
n0_17:VARCHAR, n0_18:VARCHAR, n0_19:VARCHAR, n0_20:VARCHAR, n0_21:VARCHAR, 
n0_22:VARCHAR
            Output: 135 rows (295.33KB, 1 batches), Cpu time: 129.13us, Wall 
time: 139.36us, Blocked wall time: 0ns, Peak memory: 68.00KB, Memory 
allocations: 2, CPU breakdown: B/I/O/F (85.06us/612ns/39.82us/3.63us)
            HashBuild: Input: 2 rows (32B, 1 batches), Output: 0 rows (0B, 0 
batches), Cpu time: 13.00us, Wall time: 15.54us, Blocked wall time: 0ns, Peak 
memory: 68.00KB, Memory allocations: 2, Threads: 1, CPU breakdown: B/I/O/F 
(10.05us/0ns/1.50us/1.45us)
               distinctKey0                 sum: 3, count: 1, min: 3, max: 3
               hashtable.buildWallNanos     sum: 84.66us, count: 1, min: 
84.66us, max: 84.66us
               hashtable.capacity           sum: 3, count: 1, min: 3, max: 3
               hashtable.numDistinct        sum: 2, count: 1, min: 2, max: 2
               hashtable.numRehashes        sum: 1, count: 1, min: 1, max: 1
               queuedWallNanos              sum: 0ns, count: 1, min: 0ns, max: 
0ns
               rangeKey0                    sum: 3, count: 1, min: 3, max: 3
               runningAddInputWallNanos     sum: 0ns, count: 1, min: 0ns, max: 
0ns
               runningFinishWallNanos       sum: 2.00us, count: 1, min: 2.00us, 
max: 2.00us
               runningGetOutputWallNanos    sum: 2.08us, count: 1, min: 2.08us, 
max: 2.08us
            HashProbe: Input: 135 rows (295.33KB, 1 batches), Output: 135 rows 
(295.33KB, 1 batches), Cpu time: 116.13us, Wall time: 123.82us, Blocked wall 
time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1, CPU breakdown: 
B/I/O/F (75.01us/612ns/38.32us/2.19us)
               dynamicFiltersProduced           sum: 1, count: 1, min: 1, max: 1
               queuedWallNanos                  sum: 1.00us, count: 1, min: 
1.00us, max: 1.00us
               replacedWithDynamicFilterRows    sum: 135, count: 1, min: 135, 
max: 135
               runningAddInputWallNanos         sum: 885ns, count: 1, min: 
885ns, max: 885ns
               runningFinishWallNanos           sum: 3.65us, count: 1, min: 
3.65us, max: 3.65us
               runningGetOutputWallNanos        sum: 39.83us, count: 1, min: 
39.83us, max: 39.83us
   ```
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to