xinyuezg commented on PR #8965:
URL: 
https://github.com/apache/incubator-gluten/pull/8965#issuecomment-3353138717

   > @xinyuezg So what't the data diff looks like in your env? Are there any 
duplicate records
   
   Example we encountered from `GlutenOuterJoinSuiteForceShjOff`
   
   * no condition full outer join using BroadcastNestedLoopJoin build left 
(whole-stage-codegen off)
   * no condition full outer join using BroadcastNestedLoopJoin build left 
(whole-stage-codegen on)
   * no condition full outer join using BroadcastNestedLoopJoin build right 
(whole-stage-codegen off)
   * no condition full outer join using BroadcastNestedLoopJoin build right 
(whole-stage-codegen on)
   
   Zooming into no condition full outer join using BroadcastNesteLoopJoin build 
left:
   
   Initial input data:
   ```
   left(a, b)  
   (1, 2.0)  
   (2, 100.0)  
   (2, 1.0)  
   (2, 1.0)   ← duplicated
   (3, 3.0)  
   (5, 1.0)  
   (6, 6.0)  
   (null, null)  
     
   right(c, d)  
   (0, 0.0)  
   (2, 3.0)  
   (2, -1.0)  
   (2, -1.0)  ← duplicated  
   (2, 3.0)   ← duplicated  
   (3, 2.0)  
   (4, 1.0)  
   (5, 3.0)  
   (7, 7.0)  
   (null, null)
   ```
   
   Spark plan:
   ```
    BroadcastNestedLoopJoin BuildLeft, FullOuter
   :- Filter (isnotnull(a#220) AND (a#220 = 2))
   :  +- Scan ExistingRDD[a#220,b#221]
   +- Filter (isnotnull(c#226) AND (c#226 = 2))
      +- Scan ExistingRDD[c#226,d#227]
   ```
   
   So the effective inputs to joins are:  
   * filteredLeft (build side) = {(2, 100.0), (2, 1.0), (2, 1.0)}  → 3 rows  
   * filteredRight (probe side) = {(2, 3.0), (2, -1.0), (2, -1.0), (2, 3.0)}  → 
4 rows
   
   Results:
   ```
    == Results ==
    !== Expected Answer - 12 ==   == Actual Answer - 15 ==
    [2,1.0,2,-1.0]               [2,1.0,2,-1.0]
    [2,1.0,2,-1.0]               [2,1.0,2,-1.0]
    [2,1.0,2,-1.0]               [2,1.0,2,-1.0]
    [2,1.0,2,-1.0]               [2,1.0,2,-1.0]
    [2,1.0,2,3.0]                [2,1.0,2,3.0]
    [2,1.0,2,3.0]                [2,1.0,2,3.0]
    [2,1.0,2,3.0]                [2,1.0,2,3.0]
    [2,1.0,2,3.0]                [2,1.0,2,3.0]
   ![2,100.0,2,-1.0]             [2,1.0,null,null]
   ![2,100.0,2,-1.0]             [2,1.0,null,null]
   ![2,100.0,2,3.0]              [2,100.0,2,-1.0]
   ![2,100.0,2,3.0]              [2,100.0,2,-1.0]
   !                             [2,100.0,2,3.0]
   !                             [2,100.0,2,3.0]
   !                             [2,100.0,null,null]
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to