Tomccat3 opened a new issue, #8390:
URL: https://github.com/apache/incubator-gluten/issues/8390

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   Run with blow sql:
   `insert overwrite  directory
       'oss://{bucket}/22/' using parquet
   SELECT
       COALESCE (trace.gazj, log.gazj) AS gazj,
       COALESCE (log.region_info, trace.region_info) region_info,
       COALESCE (log.country, trace.country) country,
       COALESCE (log.brand, trace.brand) brand,
       COALESCE (log.model, trace.model) model,
       COALESCE (log.lang, trace.lang) lang,
       COALESCE (log.os_sv, trace.os_sv) os_sv,
       COALESCE (log.os_bv, trace.os_bv) os_bv,
       COALESCE (trace.init_time, log.init_time) AS init_time,
       COALESCE (trace.init_dt, log.dt) AS init_dt,
       CASE
   WHEN log.dt IS NULL THEN
       CONCAT(COALESCE (trace.active_trace, ''),'0')
   ELSE
       CONCAT(COALESCE (trace.active_trace, ''),'1')END AS active_trace,
       COALESCE (log.time_utc, trace.latest_time_utc) as latest_time_utc,
       COALESCE (log.dt, trace.latest_dt) as latest_dt
   FROM
       (
           SELECT
               gazj,
               region_info,
               country,
               brand,
               model,
               lang,
               os_bv,
               os_sv,
               init_time,
               init_dt,
               active_trace,
               latest_time_utc,
               latest_dt
           FROM
               ${table}
           WHERE
               dt = '20241224'
               and latest_dt >= '20241120' 
       ) AS trace
   FULL JOIN (
       SELECT
                a.gazj,
               COALESCE(b.region_info, 'unknown') AS region_info,
               a.country,
               a.brand,
               a.model,
               a.lang,
               a.os_bv,
               a.os_sv,
               a.init_time,
               a.time_utc,
               a.dt
       FROM (
           SELECT
               gazj,
               country,
               brand,
               model,
               lang,
               os_sv,
               os_bv,
               dt,
               init_time,
               time_utc,
               row_number() over (partition by gazj order by rand()) as rn
           FROM
              ${table}
           WHERE
               dt = '20241224'
               AND length(gazj)=36
       ) a
       LEFT OUTER JOIN ${table} as b
           ON a.country = b.alpha2
       WHERE rn = 1
   ) AS log ON trace.gazj = log.gazj;
   `
   Fall back summary:
   `== Fallback Summary ==
   (15) WindowTopKFilter: Gluten does not touch it or does not support it
   (30) Scan hive trandw.dim_pub_country: Unsupported file format for 
TextReadFormat.
   
   == Physical Plan ==
   VeloxColumnarToRow (45)
   +- ^ ProjectExecTransformer (43)
      +- ^ ShuffledHashJoinExecTransformer FullOuter BuildLeft (42)
         :- ^ InputIteratorTransformer (9)
         :  +- ShuffleQueryStage (7), Statistics(sizeInBytes=133.7 GiB, 
rowCount=2.45E+8)
         :     +- ColumnarExchange (6)
         :        +- VeloxResizeBatches (5)
         :           +- ^ ProjectExecTransformer (3)
         :              +- ^ FilterExecTransformer (2)
         :                 +- ^ ScanTransformer parquet 
trandw.dws_log_device_mix_active_trace_dd (1)
         +- ^ ProjectExecTransformer (41)
            +- ^ BroadcastHashJoinExecTransformer LeftOuter BuildRight (40)
               :- ^ ProjectExecTransformer (29)
               :  +- ^ FilterExecTransformer (28)
               :     +- ^ WindowExecTransformer (27)
               :        +- ^ SortExecTransformer (26)
               :           +- ^ InputIteratorTransformer (25)
               :              +- ShuffleQueryStage (23), 
Statistics(sizeInBytes=25.2 GiB, rowCount=1.73E+8)
               :                 +- ColumnarExchange (22)
               :                    +- VeloxResizeBatches (21)
               :                       +- ^ ProjectExecTransformer (19)
               :                          +- ^ InputIteratorTransformer (18)
               :                             +- RowToVeloxColumnar (16)
               :                                +- * WindowTopKFilter (15)
               :                                   +- VeloxColumnarToRow (14)
               :                                      +- ^ 
ProjectExecTransformer (12)
               :                                         +- ^ 
FilterExecTransformer (11)
               :                                            +- ^ 
ScanTransformer parquet trandw.dwd_log_device_mix_active_di (10)
               +- ^ InputIteratorTransformer (39)
                  +- BroadcastQueryStage (37), Statistics(sizeInBytes=6.6 KiB, 
rowCount=250)
                     +- ColumnarBroadcastExchange (36)
                        +- ^ FilterExecTransformer (34)
                           +- ^ InputIteratorTransformer (33)
                              +- RowToVeloxColumnar (31)
                                 +- Scan hive trandw.dim_pub_country (30)`
   why does this happen:
   (15) WindowTopKFilter: Gluten does not touch it or does not support it
   
   ### Spark version
   
   Spark-3.3.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to