Tomccat3 opened a new issue, #8390:
URL: https://github.com/apache/incubator-gluten/issues/8390
### Backend
VL (Velox)
### Bug description
Run with blow sql:
`insert overwrite directory
'oss://{bucket}/22/' using parquet
SELECT
COALESCE (trace.gazj, log.gazj) AS gazj,
COALESCE (log.region_info, trace.region_info) region_info,
COALESCE (log.country, trace.country) country,
COALESCE (log.brand, trace.brand) brand,
COALESCE (log.model, trace.model) model,
COALESCE (log.lang, trace.lang) lang,
COALESCE (log.os_sv, trace.os_sv) os_sv,
COALESCE (log.os_bv, trace.os_bv) os_bv,
COALESCE (trace.init_time, log.init_time) AS init_time,
COALESCE (trace.init_dt, log.dt) AS init_dt,
CASE
WHEN log.dt IS NULL THEN
CONCAT(COALESCE (trace.active_trace, ''),'0')
ELSE
CONCAT(COALESCE (trace.active_trace, ''),'1')END AS active_trace,
COALESCE (log.time_utc, trace.latest_time_utc) as latest_time_utc,
COALESCE (log.dt, trace.latest_dt) as latest_dt
FROM
(
SELECT
gazj,
region_info,
country,
brand,
model,
lang,
os_bv,
os_sv,
init_time,
init_dt,
active_trace,
latest_time_utc,
latest_dt
FROM
${table}
WHERE
dt = '20241224'
and latest_dt >= '20241120'
) AS trace
FULL JOIN (
SELECT
a.gazj,
COALESCE(b.region_info, 'unknown') AS region_info,
a.country,
a.brand,
a.model,
a.lang,
a.os_bv,
a.os_sv,
a.init_time,
a.time_utc,
a.dt
FROM (
SELECT
gazj,
country,
brand,
model,
lang,
os_sv,
os_bv,
dt,
init_time,
time_utc,
row_number() over (partition by gazj order by rand()) as rn
FROM
${table}
WHERE
dt = '20241224'
AND length(gazj)=36
) a
LEFT OUTER JOIN ${table} as b
ON a.country = b.alpha2
WHERE rn = 1
) AS log ON trace.gazj = log.gazj;
`
Fall back summary:
`== Fallback Summary ==
(15) WindowTopKFilter: Gluten does not touch it or does not support it
(30) Scan hive trandw.dim_pub_country: Unsupported file format for
TextReadFormat.
== Physical Plan ==
VeloxColumnarToRow (45)
+- ^ ProjectExecTransformer (43)
+- ^ ShuffledHashJoinExecTransformer FullOuter BuildLeft (42)
:- ^ InputIteratorTransformer (9)
: +- ShuffleQueryStage (7), Statistics(sizeInBytes=133.7 GiB,
rowCount=2.45E+8)
: +- ColumnarExchange (6)
: +- VeloxResizeBatches (5)
: +- ^ ProjectExecTransformer (3)
: +- ^ FilterExecTransformer (2)
: +- ^ ScanTransformer parquet
trandw.dws_log_device_mix_active_trace_dd (1)
+- ^ ProjectExecTransformer (41)
+- ^ BroadcastHashJoinExecTransformer LeftOuter BuildRight (40)
:- ^ ProjectExecTransformer (29)
: +- ^ FilterExecTransformer (28)
: +- ^ WindowExecTransformer (27)
: +- ^ SortExecTransformer (26)
: +- ^ InputIteratorTransformer (25)
: +- ShuffleQueryStage (23),
Statistics(sizeInBytes=25.2 GiB, rowCount=1.73E+8)
: +- ColumnarExchange (22)
: +- VeloxResizeBatches (21)
: +- ^ ProjectExecTransformer (19)
: +- ^ InputIteratorTransformer (18)
: +- RowToVeloxColumnar (16)
: +- * WindowTopKFilter (15)
: +- VeloxColumnarToRow (14)
: +- ^
ProjectExecTransformer (12)
: +- ^
FilterExecTransformer (11)
: +- ^
ScanTransformer parquet trandw.dwd_log_device_mix_active_di (10)
+- ^ InputIteratorTransformer (39)
+- BroadcastQueryStage (37), Statistics(sizeInBytes=6.6 KiB,
rowCount=250)
+- ColumnarBroadcastExchange (36)
+- ^ FilterExecTransformer (34)
+- ^ InputIteratorTransformer (33)
+- RowToVeloxColumnar (31)
+- Scan hive trandw.dim_pub_country (30)`
why does this happen:
(15) WindowTopKFilter: Gluten does not touch it or does not support it
### Spark version
Spark-3.3.x
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]