Re: [I] [VL] Results are mismatch with vanilla Spark when using if expression [incubator-gluten]

via GitHub Thu, 01 Aug 2024 00:32:56 -0700


Z1Wu commented on issue #6673:
URL: 
https://github.com/apache/incubator-gluten/issues/6673#issuecomment-2262252061


   Maybe same issue as  https://github.com/apache/incubator-gluten/issues/5638.
   
   I encounter same issue that spark-gluten gets NULL when read orc file 
written by low-version Hive which doesn't include actual column name in orc 
schema.
   
   Sample ORC file looks like below: ( hive --orcfiledump <orc_file_path>)
   ```
   # hive table schema
   
   CREATE TABLE `test_orc_table_hive_gluten`(
     `id` int,
     `name` string)
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
   OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
   
   # orcfile schema 
   
   Rows: 2
   Compression: SNAPPY
   Compression size: 262144
   Type: struct<_col0:int,_col1:string>
   
   Stripe Statistics:
     Stripe 1:
       Column 0: count: 2 hasNull: false
       Column 1: count: 2 hasNull: false min: 1 max: 2 sum: 3
       Column 2: count: 2 hasNull: false min: a max: b sum: 2
   
   File Statistics:
     Column 0: count: 2 hasNull: false
     Column 1: count: 2 hasNull: false min: 1 max: 2 sum: 3
     Column 2: count: 2 hasNull: false min: a max: b sum: 2
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [VL] Results are mismatch with vanilla Spark when using if expression [incubator-gluten]

Reply via email to