pranotishanbhag edited a comment on issue #3841:
URL: https://github.com/apache/hudi/issues/3841#issuecomment-955083608


   Hi,
   
   I am facing the same issue with 0.9. My schema is as below
   ```
   root
    |-- _hoodie_commit_time: string (nullable = true)
    |-- _hoodie_commit_seqno: string (nullable = true)
    |-- _hoodie_record_key: string (nullable = true)
    |-- _hoodie_partition_path: string (nullable = true)
    |-- _hoodie_file_name: string (nullable = true)
    |-- is_deleted: boolean (nullable = true)
    |-- dedupe_key: long (nullable = true)
    |-- ums_last_updated_date: long (nullable = true)
    |-- source_created_date: long (nullable = true)
    |-- item_pairs: array (nullable = true)
    |  |-- element: struct (containsNull = true)
    |  |  |-- invalid_reasons: array (nullable = true)
    |  |  |  |-- element: string (containsNull = true)
    |  |  |-- additional_attributes: string (nullable = true)
    |  |  |-- mapping_state: string (nullable = true)
    |  |  |-- to_item_version: long (nullable = true)
    |  |  |-- to_item_attributes: string (nullable = true)
    |  |  |-- to_region_id: string (nullable = true)
    |  |  |-- to_marketplace_id: string (nullable = true)
    |  |  |-- to_item_id: string (nullable = true)
    |  |  |-- to_website_id: string (nullable = true)
    |  |  |-- to_catalog_id: string (nullable = true)
    |  |  |-- from_item_version: long (nullable = true)
    |  |  |-- from_item_attributes: string (nullable = true)
    |  |  |-- from_region_id: string (nullable = true)
    |  |  |-- from_marketplace_id: string (nullable = true)
    |  |  |-- from_item_id: string (nullable = true)
    |  |  |-- from_website_id: string (nullable = true)
    |  |  |-- from_catalog_id: string (nullable = true)
    |-- mapping_source: string (nullable = true)
    |-- state: string (nullable = true)
    |-- mapping_type: string (nullable = true)
    |-- id: string (nullable = true)
    |-- mapping_class: string (nullable = true)
   ```
   
   I do have a list of item_pairs but I do not see any column repeating in my 
schema.
   
   I have also set these options:
   ```
       val sparkConf = new SparkConf()
         .setAppName(appName)
         .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
         .set("spark.sql.hive.convertMetastoreParquet", "false")
         .set("spark.hadoop.parquet.avro.add-list-element-records", "false") // 
null array handling
         .set("spark.hadoop.parquet.avro.write-old-list-structure", "false") 
//schema evolution
         .set("parquet.avro.add-list-element-records", "false") // null array 
handling
         .set("parquet.avro.write-old-list-structure", "false") //schema 
evolution
   ```
   
   Also set hoodie.avro.schema.validate = true But i dont see any schema issue 
reported with this option.
   
   I am using COW mode with Hudi 0.9 and spark 2.4.
   
   Please can you help with this as my launch is blocked because of this issue.
   
   Thanks,
   Pranoti


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to