Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

via GitHub Sat, 21 Oct 2023 15:41:23 -0700


yihua commented on code in PR #9876:
URL: https://github.com/apache/hudi/pull/9876#discussion_r1367807326



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##########
@@ -411,10 +414,14 @@ object ExpressionPayload {
     parseSchema(props.getProperty(PAYLOAD_RECORD_AVRO_SCHEMA))
   }
 
-  private def getWriterSchema(props: Properties): Schema = {
-    
ValidationUtils.checkArgument(props.containsKey(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key),
-      s"Missing ${HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key} property")
-    parseSchema(props.getProperty(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key))
+  private def getWriterSchema(props: Properties, isPartialUpdate: Boolean): 
Schema = {
+    if (isPartialUpdate) {
+      
parseSchema(props.getProperty(HoodieWriteConfig.WRITE_PARTIAL_UPDATE_SCHEMA.key))

Review Comment:
   It depends on what the `SCHEMA` header in the log block refers to.  My 
assumption is that the `SCHEMA` header corresponds to the records written to 
the log block, so if the records are partial, `SCHMEA` is partial, and we use 
another header to indicate it's partial, to differentiate from schema evolution.
   
   Either way, do you agree that we need to keep both full and partial schemas 
in the log block header?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

Reply via email to