Z1Wu commented on issue #6673: URL: https://github.com/apache/incubator-gluten/issues/6673#issuecomment-2262252061
Maybe same issue as https://github.com/apache/incubator-gluten/issues/5638. I encounter same issue that spark-gluten gets NULL when read orc file written by low-version Hive which doesn't include actual column name in orc schema. Sample ORC file looks like below: ( hive --orcfiledump <orc_file_path>) ``` # hive table schema CREATE TABLE `test_orc_table_hive_gluten`( `id` int, `name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' # orcfile schema Rows: 2 Compression: SNAPPY Compression size: 262144 Type: struct<_col0:int,_col1:string> Stripe Statistics: Stripe 1: Column 0: count: 2 hasNull: false Column 1: count: 2 hasNull: false min: 1 max: 2 sum: 3 Column 2: count: 2 hasNull: false min: a max: b sum: 2 File Statistics: Column 0: count: 2 hasNull: false Column 1: count: 2 hasNull: false min: 1 max: 2 sum: 3 Column 2: count: 2 hasNull: false min: a max: b sum: 2 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
