lgbo-ustc opened a new issue, #8828:
URL: https://github.com/apache/incubator-gluten/issues/8828
### Backend
CH (ClickHouse)
### Bug description
```sql
select b.uid as uid
,a.userid
,a.birthday
,a.rtime
from
(select userid
,birthday
,rtime
from
(select userid
,birthday
,rtime
-- ,row_number() over(partition by userid order by rtime desc) as rk
,1 as rk
from
(
select userid
,birthday
,rtime
from t1
union all
select userid
,from_unixtime(unix_timestamp(content,
'MM/dd/yyyy'),'yyyy-MM-dd') as birthday
,time as rtime
from t1
where day = '${day}'
and id = '01505004'
and emoji_name='birthday'
and click ='introduction_save'
and status ='1'
) a
) a
where rk=1 and birthday<>''
) a
join
t2 b
on a.userid=udfhash(b.uid);
```
```
Execute InsertIntoHadoopFsRelationCommand (38)
+- FakeRowAdaptor (37)
+- AdaptiveSparkPlan (36)
+- == Final Plan ==
^ ProjectExecTransformer (23)
+- ^ InputIteratorTransformer (22)
+- RowToCHNativeColumnar (20)
+- ShuffledHashJoin Inner BuildLeft (19)
:- CHNativeColumnarToRow (12)
: +- AQEShuffleRead (11)
: +- ShuffleQueryStage (10),
Statistics(sizeInBytes=26.8 MiB, rowCount=9.05E+5)
: +- ColumnarExchange (9)
: +- ColumnarUnion (8)
: :- ^ FilterExecTransformer (2)
: : +- ^ ScanTransformer orc
indigo_mediate_tb.indigo_swh_uid_birthday (1)
: +- ^ ProjectExecTransformer (6)
: +- ^ FilterExecTransformer (5)
: +- ^ ScanTransformer parquet t1(4)
+- AQEShuffleRead (18)
+- ShuffleQueryStage (17), Statistics(sizeInBytes=28.8
GiB, rowCount=9.66E+8)
+- Exchange (16)
+- CHNativeColumnarToRow (15)
+- ^ ScanTransformer orc t2(13)
```
Since `udfhash` is an unsupported UDF, the plan has a fallback on the join
operation. But the the fallback on the left table takes place after the
shuffle, this cause the left partitions don't match with the right partitions,
and all the result is wrong.
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]