danny0405 commented on code in PR #9876:
URL: https://github.com/apache/hudi/pull/9876#discussion_r1368076982
##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##########
@@ -411,10 +414,14 @@ object ExpressionPayload {
parseSchema(props.getProperty(PAYLOAD_RECORD_AVRO_SCHEMA))
}
- private def getWriterSchema(props: Properties): Schema = {
-
ValidationUtils.checkArgument(props.containsKey(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key),
- s"Missing ${HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key} property")
- parseSchema(props.getProperty(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key))
+ private def getWriterSchema(props: Properties, isPartialUpdate: Boolean):
Schema = {
+ if (isPartialUpdate) {
+
parseSchema(props.getProperty(HoodieWriteConfig.WRITE_PARTIAL_UPDATE_SCHEMA.key))
Review Comment:
I'm not very approval with two schemas here, partial update is actually a
special case of schema evolution (that only supplementing new columns and never
drop columns), the writer should still take the full schema always, and we can
persist the partial schema to the lock block for some optimization purposes
(for example, the handing of missing values as nulls, with partial schema, we
can know that whether a null value comes from partial update or it should be
force updated as null).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]