[GitHub] [hudi] BruceKellan commented on issue #8685: [SUPPORT] flink hudi-0.13.0 append+clustering mode, clustering will occur The requested schema is not compatible with the file schema. incompatible types: required binary key (STRING) != optional binary key (STRING)

via GitHub Mon, 15 May 2023 21:18:18 -0700


BruceKellan commented on issue #8685:
URL: https://github.com/apache/hudi/issues/8685#issuecomment-1548964729


   @danny0405 
   danny, I have initially located the problem, I would like to hear your 
different options.
   
   In master, hudi-flink maintain the logic of writing to Parquet separately 
and is inconsistent with the schema of the parquet file written by spark when 
using complex types. 
   I did some tests using flink and spark to write the same complex type and 
their schema in parquet is different.
   The biggest difference is the key part of map in spark is required.
   
   spark_insert.parquet:
   ```
   message hoodie.hudi_trips_cow.hudi_trips_cow_record {
     optional binary _hoodie_commit_time (STRING);
     optional binary _hoodie_commit_seqno (STRING);
     optional binary _hoodie_record_key (STRING);
     optional binary _hoodie_partition_path (STRING);
     optional binary _hoodie_file_name (STRING);
     optional int32 f_int;
     optional group f_array (LIST) {
       repeated binary array (STRING);
     }
     optional group int_array (LIST) {
       repeated int32 array;
     }
     optional group f_map (MAP) {
       repeated group map (MAP_KEY_VALUE) {
         required binary key (STRING);
         optional int32 value;
       }
     }
     optional group f_row {
       optional group f_nested_array (LIST) {
         repeated binary array (STRING);
       }
       optional group f_nested_row {
         optional int32 f_row_f0;
         optional binary f_row_f1 (STRING);
       }
     }
   }
   ```
   
   flink_insert.parquet:
   ```
   message flink_schema {
     optional binary _hoodie_commit_time (STRING);
     optional binary _hoodie_commit_seqno (STRING);
     optional binary _hoodie_record_key (STRING);
     optional binary _hoodie_partition_path (STRING);
     optional binary _hoodie_file_name (STRING);
     required int32 f_int;
     optional group f_array (LIST) {
       repeated group list {
         optional binary element (STRING);
       }
     }
     optional group int_array (LIST) {
       repeated group list {
         optional int32 element;
       }
     }
     optional group f_map (MAP) {
       repeated group key_value {
         optional binary key (STRING);
         optional int32 value;
       }
     }
     optional group f_row {
       optional group f_nested_array (LIST) {
         repeated group list {
           optional binary element (STRING);
         }
       }
       optional group f_nested_row {
         optional int32 f_row_f0;
         optional binary f_row_f1 (STRING);
       }
     }
   }
   ```
   
   The reason why there was no problem in 0.12.3 is because #7345. 
   This PR seems to be applicable to spark, but due to the inconsistency of the 
flink schema, an error is reported after set request projection schema.
   
   
https://github.com/apache/hudi/blob/d2b411ad192cc5113363398e985cb21647fa8693/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroParquetReader.java#LL161C1-L167C6
   
   IMO, we may need to add a patch to rollback the change of clustering 
operator.
   And then we need to unified the flink parquet schema and spark parquet 
schema. but it's a breaking change. WDYT?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] BruceKellan commented on issue #8685: [SUPPORT] flink hudi-0.13.0 append+clustering mode, clustering will occur The requested schema is not compatible with the file schema. incompatible types: required binary key (STRING) != optional binary key (STRING)

Reply via email to