beliefer commented on issue #11980:
URL: https://github.com/apache/gluten/issues/11980#issuecomment-4396090101

   Take for example.
   ```
   set spark.gluten.sql.columnar.forceShuffledHashJoin=false;
   set spark.gluten.sql.columnar.shuffledHashJoin=false;
   SELECT /*+ MERGE(htmp1) */ htmp1.c2 AS 1c2, htmp1.c3 AS 1c3, htmp2.c2 AS 2c2 
FROM htmp1 JOIN htmp2 ON htmp1.c1 = htmp2.c1;
   ```
   
   Execution plan structure (confirmed)
   
   ```
     SortMergeJoinExecTransformer  →  Output 0 rows, 0.0 B                      
                 
     :- SortExecTransformer (left)  →  100 rows, 2.3 KiB                        
                                                                                
           
     :  +- InputIteratorTransformer                                             
                                                                                
         
     :     +- RowToVeloxColumnar          ← htmp1 Follow the vanilla path       
                                                                                
                 
     :        +- Scan hive htmp1          ← 100 rows                            
                                                                                
           
     +- SortExecTransformer (right) →  100 rows, 4.2 KiB                        
                                                                                
           
        +- InputIteratorTransformer                                             
                                                                                
         
           +- NativeScan hive htmp2       ← htmp2 takes the Velox native path, 
100 rows
   ```
   
   100 rows on each side are sent to `SortExec`, 100 rows on each side are sent 
to SMJ, and SMJ outputs 0 rows. The data is fine, the problem lies in comparing 
the join keys.                                                                 
     
   ---                                                                          
                                                                                
       
   Root cause: Different internal representations of DECIMAL generated by two 
read paths
                                                                                
                                                                                
         
   It's not an issue with `RowToVeloxColumnar` itself, it's just a format 
bridge. The real root cause lies in the fact that two paths generate 
semantically equivalent but internally encoded DECIMAL values.
                                                                                
                                                                                
         
   Left side (Scan hive → RowToVeloxColumnar):
                                                                                
                                                                                
         
   1. Hive Metastore record c1 column type is DECIMAL(20, 0)                    
                                                                                
               
   2. However, the type written in the footer of the ORC file is decimal (38, 
18) (which is the schema inconsistency you observed)
   3. Scan hive (vanilla Spark ORC reader) to read using Metastore's type 
DECIMAL (20,0), and the values read are 1, 2, 3 .. (integer)
   4. RowToVeloxColumnar converts Spark's DECIMAL (20,0) to Velox's int128 with 
an accuracy of 20 scale 0, Internal value example: 1 → 0x00000001
                                                                                
                                                                                
         
   Right side (NativeScan hive):
                                                                                
                                                                                
         
   1. Velox's ORC reader directly reads the type in the footer of the ORC file, 
which is decimal (38, 18)
   2. Under the encoding of scale=18, the internal value of int128 is 1 × 10 ^ 
18=10000000000000000
   3. Namely: 1 → 0 x0DE0B6B3A7640000                                           
                                                                                
            
     
   SMJ performs key comparison in the Velox layer:
                                                               
   Left key: int128=1 (scale=0)
   Right key: int128=10 ^ 18 (scale=18)
   Comparison result: Unequal → No matching rows
     
   Therefore, no match can be found in the Cartesian space of 100 × 100, and 0 
rows are output.
   
   ---
   `RowToVeloxColumnar`'s role
   `RowToVeloxColumnar` is not the root cause, but the key diagnostic clue is:
   -Its appearance indicates that the htmp1 side did not go through NativeScan, 
but rather went through it vanilla Scan hive                                    
                                                        
   -This is precisely the manifestation of asymmetry: different ORC readers are 
used on both sides, and different readers interpret the type of the DECIMAL 
column of the same ORC file differently (one using Metastore type, the other 
using file footer type)
   -If both sides are NativeScan (or both are Scan hive+`RowToVeloxColumnar`), 
the internal representation of DECIMAL will be consistent, and join will match 
correctly
                                                               
   It can be understood that RowToVeloxColumnar is the symptom, and the 
asymmetric scan path is the cause.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to