lgbo-ustc opened a new issue, #8828:
URL: https://github.com/apache/incubator-gluten/issues/8828

   ### Backend
   
   CH (ClickHouse)
   
   ### Bug description
   
   ```sql
   select b.uid as uid
   ,a.userid
   ,a.birthday
   ,a.rtime
   from
       (select userid
       ,birthday
       ,rtime
       from
           (select userid
           ,birthday
           ,rtime
           -- ,row_number() over(partition by userid order by rtime desc) as rk
           ,1 as rk
           from
               (
               select userid
               ,birthday
               ,rtime
               from t1
   
               union all
               select userid
               ,from_unixtime(unix_timestamp(content, 
'MM/dd/yyyy'),'yyyy-MM-dd') as birthday
               ,time as rtime
               from t1
               where day = '${day}'
               and id = '01505004'
               and emoji_name='birthday'
               and click ='introduction_save'
               and status ='1'
               ) a
           ) a
       where rk=1 and birthday<>''
       ) a
   join
       t2 b
   on a.userid=udfhash(b.uid);
   ```
   
   ```
   Execute InsertIntoHadoopFsRelationCommand (38)
   +- FakeRowAdaptor (37)
      +- AdaptiveSparkPlan (36)
         +- == Final Plan ==
            ^ ProjectExecTransformer (23)
            +- ^ InputIteratorTransformer (22)
               +- RowToCHNativeColumnar (20)
                  +- ShuffledHashJoin Inner BuildLeft (19)
                     :- CHNativeColumnarToRow (12)
                     :  +- AQEShuffleRead (11)
                     :     +- ShuffleQueryStage (10), 
Statistics(sizeInBytes=26.8 MiB, rowCount=9.05E+5)
                     :        +- ColumnarExchange (9)
                     :           +- ColumnarUnion (8)
                     :              :- ^ FilterExecTransformer (2)
                     :              :  +- ^ ScanTransformer orc 
indigo_mediate_tb.indigo_swh_uid_birthday (1)
                     :              +- ^ ProjectExecTransformer (6)
                     :                 +- ^ FilterExecTransformer (5)
                     :                    +- ^ ScanTransformer parquet t1(4)
                     +- AQEShuffleRead (18)
                        +- ShuffleQueryStage (17), Statistics(sizeInBytes=28.8 
GiB, rowCount=9.66E+8)
                           +- Exchange (16)
                              +- CHNativeColumnarToRow (15)
                                 +- ^ ScanTransformer orc t2(13)
   ```
   
   Since `udfhash` is an unsupported UDF,  the plan has a fallback on the join 
operation. But the the fallback on the left table takes place after the 
shuffle, this cause the left partitions don't match with the right partitions, 
and all the result is wrong.
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to