prashantwason commented on a change in pull request #3836:
URL: https://github.com/apache/hudi/pull/3836#discussion_r743981870



##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
##########
@@ -95,10 +95,19 @@ public SparkRDDWriteClient(HoodieEngineContext context, 
HoodieWriteConfig writeC
   public SparkRDDWriteClient(HoodieEngineContext context, HoodieWriteConfig 
writeConfig,
                              Option<EmbeddedTimelineService> timelineService) {
     super(context, writeConfig, timelineService);
+    bootstrapMetadataTable();
+  }
+
+  private void bootstrapMetadataTable() {
     if (config.isMetadataTableEnabled()) {
-      // If the metadata table does not exist, it should be bootstrapped here
-      // TODO: Check if we can remove this requirement - auto bootstrap on 
commit
-      
SparkHoodieBackedTableMetadataWriter.create(context.getHadoopConf().get(), 
config, context);
+      // Defer bootstrap if upgrade / downgrade is pending
+      HoodieTableMetaClient metaClient = createMetaClient(true);
+      UpgradeDowngrade upgradeDowngrade = new UpgradeDowngrade(
+          metaClient, config, context, 
SparkUpgradeDowngradeHelper.getInstance());
+      if 
(!upgradeDowngrade.needsUpgradeOrDowngrade(HoodieTableVersion.current())) {

Review comment:
       Yes, the bootstrap will happen the next time the SparkRDDWRiteCLient is 
created (probably in the next clean). 
   
   Currently this is what happens (assuming an existing Table with version 2 
and using 0.10 code which has version 3):
   1. SparkRDDWriteClient constructor - finds no table so bootstrap it (wasted 
bootstrap)
   2. SparkRDDWriteClient.insert() - runs upgrade code in getTableAndXXX() and 
there the metadata table is deleted.
   
   Next run:
   1. SparkRDDWriteClient constructor - finds no table so bootstrap it (second 
bootstrap)
   
   
   For file listing this wasted bootstrap is kinda ok but other indexes if 
enabled together (e.g. record-level-index enabled with metadata table), then 
this is a lot of wasted time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to