Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

via GitHub Sun, 22 Oct 2023 19:55:59 -0700


danny0405 commented on code in PR #9876:
URL: https://github.com/apache/hudi/pull/9876#discussion_r1368076982



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##########
@@ -411,10 +414,14 @@ object ExpressionPayload {
     parseSchema(props.getProperty(PAYLOAD_RECORD_AVRO_SCHEMA))
   }
 
-  private def getWriterSchema(props: Properties): Schema = {
-    
ValidationUtils.checkArgument(props.containsKey(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key),
-      s"Missing ${HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key} property")
-    parseSchema(props.getProperty(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key))
+  private def getWriterSchema(props: Properties, isPartialUpdate: Boolean): 
Schema = {
+    if (isPartialUpdate) {
+      
parseSchema(props.getProperty(HoodieWriteConfig.WRITE_PARTIAL_UPDATE_SCHEMA.key))

Review Comment:
   I'm not very approval with two schemas here, partial update is actually a 
special case of schema evolution (that only supplementing new columns and never 
drop columns), the writer should still take the full schema always, and we can 
persist the partial schema to the lock block for some optimization purposes 
(for example, the handing of missing values as nulls, with partial schema, we 
can know that whether a null value comes from partial update or it should be 
force updated as null).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

Reply via email to