Limess opened a new issue #3834:
URL: https://github.com/apache/hudi/issues/3834


   **Describe the problem you faced**
   
   Querying the snapshot table (suffix `-rt`) fails using Amazon Athena when 
the schema contains nested fields.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create table using a column `entity_salience` with the following schema: 
`array<struct<salience:double,salience_rank:bigint,wiki_title:string>>`
   2. Attempt to query the table with Athena
   
   **Environment Description**
   
   EMR 6.4.0
   
   Athena workgroup V2 (experienced on 2021/10/20)
   
   * Hudi version :
   
   0.9.0
   0.8.0-amzn1
   
   * Spark version :
   
   3.1.2
   
   * Hive version :
   
   Hive 3.1.2
   
   * Hadoop version :
   
   Amazon 3.2.1
   
   * Storage (HDFS/S3/GCS..) :
   
   S3
   
   * Running on Docker? (yes/no) :
   
   no
   
   **Additional context**
   
   We have several columns which produce this issue, the schemas are as follows:
   
   * 
`array<struct<offset:bigint,overlapping:boolean,position:string,rule_based_entity:boolean,sentiment:struct<compound:double,neg:double,neu:double,pos:double>,signal_type:string,surface_form:string,wiki_title:string>>`
   * `array<struct<id:string,score:string>>`
                                
   
   This doesn't seem to be obvious between columns, for example a column with 
this schema has no issues:
   
   `array<struct<end:bigint,start:bigint,text:string>>`
   
   **Stacktrace**
   
   ```
   HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split 
s3://prod-signal-hudi-experiment-datalake/hudi/documents_datalake_from_parquet_merge_on_read_upsert_v2/story_published_date=2020-01-30/cf99fa1e-a678-4dd7-a36e-72e57d50a936-0_16-34-337_20211019174019.parquet
 (offset=33554432, length=33554432) using 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Can't 
redefine: array
   This query ran against the "pipeline_reprocessing_hudi_experiment" database, 
unless qualified by the query. Please post the error message on our forum  or 
contact customer support  with Query Id: f1c60df8-e018-4210-962c-2cbb21aaa18c
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to