voonhous commented on code in PR #17751:
URL: https://github.com/apache/hudi/pull/17751#discussion_r2653289325


##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -478,6 +485,120 @@ public static HoodieSchema createUUID() {
     return new HoodieSchema(uuidSchema);
   }
 
+  /**
+   * Creates an unshredded Variant schema.

Review Comment:
   I don't quite understand this question. The implementation follows the 
parquet schema spec. 
   
   Nonetheless, will like to still understand what you're pushing towards.
   Do you mean if we can have a `unshredded_typed_column` and 
`shredded_typed_column` in the dataset? 
   
   Or are you saying that since `shredded_variant` typed columns can hold 
unshredded data, we should just maintain the shredded type?
   
   # Unshredded
   ```
   optional group variant_unshredded (VARIANT) {
     required binary metadata;
     required binary value;
   }
   ```
   
   # Shredded
   ```
   optional group variant_shredded (VARIANT) {
     required binary metadata;
     optional binary value;
     optional int64 typed_value;
   }
   ```
   
   So, to use shredded schema to represent unshredded, we can just make 
`typed_value` null and populate `value`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to