prashantwason commented on a change in pull request #3836:
URL: https://github.com/apache/hudi/pull/3836#discussion_r743981870
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
##########
@@ -95,10 +95,19 @@ public SparkRDDWriteClient(HoodieEngineContext context,
HoodieWriteConfig writeC
public SparkRDDWriteClient(HoodieEngineContext context, HoodieWriteConfig
writeConfig,
Option<EmbeddedTimelineService> timelineService) {
super(context, writeConfig, timelineService);
+ bootstrapMetadataTable();
+ }
+
+ private void bootstrapMetadataTable() {
if (config.isMetadataTableEnabled()) {
- // If the metadata table does not exist, it should be bootstrapped here
- // TODO: Check if we can remove this requirement - auto bootstrap on
commit
-
SparkHoodieBackedTableMetadataWriter.create(context.getHadoopConf().get(),
config, context);
+ // Defer bootstrap if upgrade / downgrade is pending
+ HoodieTableMetaClient metaClient = createMetaClient(true);
+ UpgradeDowngrade upgradeDowngrade = new UpgradeDowngrade(
+ metaClient, config, context,
SparkUpgradeDowngradeHelper.getInstance());
+ if
(!upgradeDowngrade.needsUpgradeOrDowngrade(HoodieTableVersion.current())) {
Review comment:
Yes, the bootstrap will happen the next time the SparkRDDWRiteCLient is
created (probably in the next clean).
Currently this is what happens (assuming an existing Table with version 2
and using 0.10 code which has version 3):
1. SparkRDDWriteClient constructor - finds no table so bootstrap it (wasted
bootstrap)
2. SparkRDDWriteClient.insert() - runs upgrade code in getTableAndXXX() and
there the metadata table is deleted.
Next run:
1. SparkRDDWriteClient constructor - finds no table so bootstrap it (second
bootstrap)
For file listing this wasted bootstrap is kinda ok but other indexes if
enabled together (e.g. record-level-index enabled with metadata table), then
this is a lot of wasted time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]