taiyang-li commented on issue #8022:
URL: 
https://github.com/apache/incubator-gluten/issues/8022#issuecomment-2492728235

   As a comparison, if we make sure written map column doesn't contain null, 
the issue doesn't appear any more. And the written and read columns are 
consistent. 
   
   ``` sql 
   drop table if exists tmp.tnmpnn;
   create table tmp.tnmpnn using parquet as 
   with data_source as (
       select
           id as uid,
           case when random() < 0.1 then null else floor(random() * 100) end as 
rec_room_id,
           case when random() < 0.1 then null else floor(random() * 100) end as 
room_id,
           case when random() < 0.1 then null else floor(random() * 100) end as 
dispatch_id,
           case when random() < 0.1 then null else floor(random() * 100) end as 
gift_value_total,
           case when random() < 0.1 then null else floor(random() * 100) end as 
follow_channel,
           case when random() < 0.1 then null else floor(random() * 100) end as 
follow_user,
           case when random() < 0.1 then null else floor(random() * 100) end as 
followed_channel,
           case when random() < 0.1 then null else floor(random() * 100) end as 
need_filter,
           case when random() < 0.1 then null else floor(random() * 100) end as 
mic_time
       from range(100000)
   )
   select
       uid,
       rec_room_id,
       room_id,
       dispatch_id,
       str_to_map(
           concat(
               'gift_value_total:', coalesce(gift_value_total, 0),
               ',follow_channel:', coalesce(follow_channel, 0),
               ',follow_user:', coalesce(follow_user, 0),
               ',followed_channel:', coalesce(followed_channel, 0),
               ',filter:', coalesce(need_filter, 0),
               ',mic_time:', coalesce(mic_time, 0)
           )
       ) as label_map,
       mic_time
   from
       data_source;
   ```
   
   ```
   :) select count(1) from 
hdfs('hdfs://bigocluster/apps/hive/warehouse/tmp.db/tnmpnn/part*.parquet') 
where label_map is not null and toInt64OrZero(label_map['mic_time']) != 
mic_time;
   
   SELECT count(1)
   FROM 
hdfs('hdfs://bigocluster/apps/hive/warehouse/tmp.db/tnmpnn/part*.parquet')
   WHERE (label_map IS NOT NULL) AND (toInt64OrZero(label_map['mic_time']) != 
mic_time)
   
   Query id: 7d7d1518-6c24-4603-b0b7-8e5b6b61aa44
   
      ┌─count(1)─┐
   1. │        0 │
      └──────────┘
   
   1 row in set. Elapsed: 16.334 sec. Processed 100.00 thousand rows, 1.52 MB 
(6.12 thousand rows/s., 92.91 KB/s.)
   Peak memory usage: 27.36 MiB.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to