Limess opened a new issue #3834:
URL: https://github.com/apache/hudi/issues/3834
**Describe the problem you faced**
Querying the snapshot table (suffix `-rt`) fails using Amazon Athena when
the schema contains nested fields.
**To Reproduce**
Steps to reproduce the behavior:
1. Create table using a column `entity_salience` with the following schema:
`array<struct<salience:double,salience_rank:bigint,wiki_title:string>>`
2. Attempt to query the table with Athena
**Environment Description**
EMR 6.4.0
Athena workgroup V2 (experienced on 2021/10/20)
* Hudi version :
0.9.0
0.8.0-amzn1
* Spark version :
3.1.2
* Hive version :
Hive 3.1.2
* Hadoop version :
Amazon 3.2.1
* Storage (HDFS/S3/GCS..) :
S3
* Running on Docker? (yes/no) :
no
**Additional context**
We have several columns which produce this issue, the schemas are as follows:
*
`array<struct<offset:bigint,overlapping:boolean,position:string,rule_based_entity:boolean,sentiment:struct<compound:double,neg:double,neu:double,pos:double>,signal_type:string,surface_form:string,wiki_title:string>>`
* `array<struct<id:string,score:string>>`
This doesn't seem to be obvious between columns, for example a column with
this schema has no issues:
`array<struct<end:bigint,start:bigint,text:string>>`
**Stacktrace**
```
HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split
s3://prod-signal-hudi-experiment-datalake/hudi/documents_datalake_from_parquet_merge_on_read_upsert_v2/story_published_date=2020-01-30/cf99fa1e-a678-4dd7-a36e-72e57d50a936-0_16-34-337_20211019174019.parquet
(offset=33554432, length=33554432) using
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Can't
redefine: array
This query ran against the "pipeline_reprocessing_hudi_experiment" database,
unless qualified by the query. Please post the error message on our forum or
contact customer support with Query Id: f1c60df8-e018-4210-962c-2cbb21aaa18c
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]