yihua commented on code in PR #9876:
URL: https://github.com/apache/hudi/pull/9876#discussion_r1367807326
##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##########
@@ -411,10 +414,14 @@ object ExpressionPayload {
parseSchema(props.getProperty(PAYLOAD_RECORD_AVRO_SCHEMA))
}
- private def getWriterSchema(props: Properties): Schema = {
-
ValidationUtils.checkArgument(props.containsKey(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key),
- s"Missing ${HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key} property")
- parseSchema(props.getProperty(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key))
+ private def getWriterSchema(props: Properties, isPartialUpdate: Boolean):
Schema = {
+ if (isPartialUpdate) {
+
parseSchema(props.getProperty(HoodieWriteConfig.WRITE_PARTIAL_UPDATE_SCHEMA.key))
Review Comment:
It depends on what the `SCHEMA` header in the log block refers to. My
assumption is that the `SCHEMA` header corresponds to the records written to
the log block, so if the records are partial, `SCHMEA` is partial, and we use
another header to indicate it's partial, to differentiate from schema evolution.
Either way, do you agree that we need to keep both full and partial schemas
in the log block header?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]