rahil-c commented on code in PR #13756:
URL: https://github.com/apache/hudi/pull/13756#discussion_r2296839504
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala:
##########
@@ -211,17 +211,19 @@ class HoodieFileGroupReaderBasedFileFormat(tablePath:
String,
val engineContext = new HoodieSparkEngineContext(new
JavaSparkContext(spark.sparkContext))
val maxMemoryPerCompaction =
IOUtils.getMaxMemoryPerCompaction(engineContext.getTaskContextSupplier,
options.asJava)
+ // Create metaclient on driver to avoid expensive operations on executors
+ val storageConf = new
HadoopStorageConfiguration(broadcastedStorageConf.value.value)
Review Comment:
@yihua Have pushed a commit to try using the `augmentedStorageConf`. I
wonder thought if this is safe as I thought the reason of using this `val
broadcastedStorageConf = spark.sparkContext.broadcast(new
SerializableConfiguration(augmentedStorageConf.unwrap()))` was because of the
fact that the `augmentedStorageConf` may not be serializable, due to its hadoop
`Configuration` usage which does not implement .
If there is an issue with the recent commit then i think i might be good to
switch back to what it was before.
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala:
##########
@@ -211,17 +211,19 @@ class HoodieFileGroupReaderBasedFileFormat(tablePath:
String,
val engineContext = new HoodieSparkEngineContext(new
JavaSparkContext(spark.sparkContext))
val maxMemoryPerCompaction =
IOUtils.getMaxMemoryPerCompaction(engineContext.getTaskContextSupplier,
options.asJava)
+ // Create metaclient on driver to avoid expensive operations on executors
+ val storageConf = new
HadoopStorageConfiguration(broadcastedStorageConf.value.value)
+ val metaClient: HoodieTableMetaClient = HoodieTableMetaClient
+ .builder().setConf(storageConf).setBasePath(tablePath).build
Review Comment:
Let me make a follow up JIRA?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]