wzx140 commented on code in PR #6745:
URL: https://github.com/apache/hudi/pull/6745#discussion_r982611143
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -867,18 +867,20 @@ object HoodieSparkSqlWriter {
hoodieRecord
}).toJavaRDD()
case HoodieRecord.HoodieRecordType.SPARK =>
+ log.info("Use spark record")
// ut will use AvroKeyGenerator, so we need to cast it in spark record
val sparkKeyGenerator =
keyGenerator.asInstanceOf[SparkKeyGeneratorInterface]
+ val schemaWithMetaField = HoodieAvroUtils.addMetadataFields(schema,
config.allowOperationMetadataField)
val structType = HoodieInternalRowUtils.getCachedSchema(schema)
- val structTypeBC = SparkContext.getOrCreate().broadcast(structType)
- HoodieInternalRowUtils.addCompressedSchema(structType)
+ val structTypeWithMetaField =
HoodieInternalRowUtils.getCachedSchema(schemaWithMetaField)
+ val structTypeBC = sparkContext.broadcast(structType)
+ HoodieInternalRowUtils.broadcastCompressedSchema(List(structType,
structTypeWithMetaField), sparkContext)
Review Comment:
I don't agree with that Broadcast does not work. I have tested this on yarn
and it worked. The process of broadcast: the driver will put the blockcast
variable into the blockmanager. And executors will be obtained locally
according to the block id, and if it is not available locally, it will be
pulled from the remote driver.
If the Broadcast works, we do not need to pass record's schema externally
right? Passing record's schema externally requires user of HoodieRecord to pass
the right schema.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]