Z1Wu commented on issue #6673:
URL: 
https://github.com/apache/incubator-gluten/issues/6673#issuecomment-2262252061

   Maybe same issue as  https://github.com/apache/incubator-gluten/issues/5638.
   
   I encounter same issue that spark-gluten gets NULL when read orc file 
written by low-version Hive which doesn't include actual column name in orc 
schema.
   
   Sample ORC file looks like below: ( hive --orcfiledump <orc_file_path>)
   ```
   # hive table schema
   
   CREATE TABLE `test_orc_table_hive_gluten`(
     `id` int,
     `name` string)
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
   OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
   
   # orcfile schema 
   
   Rows: 2
   Compression: SNAPPY
   Compression size: 262144
   Type: struct<_col0:int,_col1:string>
   
   Stripe Statistics:
     Stripe 1:
       Column 0: count: 2 hasNull: false
       Column 1: count: 2 hasNull: false min: 1 max: 2 sum: 3
       Column 2: count: 2 hasNull: false min: a max: b sum: 2
   
   File Statistics:
     Column 0: count: 2 hasNull: false
     Column 1: count: 2 hasNull: false min: 1 max: 2 sum: 3
     Column 2: count: 2 hasNull: false min: a max: b sum: 2
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to