Z1Wu commented on issue #6673:
URL:
https://github.com/apache/incubator-gluten/issues/6673#issuecomment-2262441003
> @Z1Wu It looks like table schema is same.(DESCRIBE FORMATTED <table_name>)
old table:
Hive orc table have table schema and its orc data file should also contain
schema too, but orc data file written by some old engine(like hive-1.x)
contains incomplete schema ( lack of column name).
For a hive orc table create by :
```
CREATE TABLE `test_orc_table_hive_gluten`(
`id` int,
`name` string)
PARTITIONED BY (
`dt` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
```
You can get orc data file schema using this command :
```
# hive --orcfiledump <your orc data file>
hive --orcfiledump
hdfs:///data/warehouse2/test_orc_table_hive_gluten/dt=20240728/000000_0
```
Malformed orc schema output looks like below. Orc file with schema like
`Type: struct<_col0:int,_col1:string>` can't be read by gluten. Result will
always be NULL. Expected orc file schema should be `Type:
struct<id:int,name:string>`
```
File Version: 0.12 with HIVE_8732
24/08/01 15:28:01 INFO orc.ReaderImpl: Reading ORC rows from
hdfs://data/warehouse2/test_orc_table_hive_gluten/dt=20240728/000000_0 with
{include: null, offset: 0, length: 9223372036854775807}
Rows: 2
Compression: SNAPPY
Compression size: 262144
Type: struct<_col0:int,_col1:string>
Stripe Statistics:
Stripe 1:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false min: 1 max: 2 sum: 3
Column 2: count: 2 hasNull: false min: a max: b sum: 2
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]