ayushtkn commented on code in PR #2993:
URL: https://github.com/apache/hive/pull/2993#discussion_r849128393


##########
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java:
##########
@@ -157,10 +159,20 @@ public ReplLoadWork(HiveConf hiveConf, String 
dumpDirectory,
       Path incBootstrapDir = new Path(dumpDirectory, 
ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME);
       if (fs.exists(incBootstrapDir)) {
         if (isSecondFailover) {
-          String[] tableList = getBootstrapTableList(dumpDirParent, hiveConf);
-          tablesToBootstrap = Arrays.asList(tableList);
-          LOG.info("Optimised bootstrap for database {} with load with 
bootstrap table list as {}", dbNameToLoadIn,
+          String[] bootstrappedTables = getBootstrapTableList(new 
Path(dumpDirectory).getParent(), hiveConf);
+          tablesToBootstrap = new 
ArrayList<String>(Arrays.asList(bootstrappedTables));
+          LOG.info("Optimised bootstrap for database {} with load with 
bootstrapped table list as {}", dbNameToLoadIn,
               tablesToBootstrap);
+          ArrayList<String> tableList = new 
ArrayList<String>(Arrays.asList(bootstrappedTables));
+          // Get list of tables bootstrapped.
+          Path tableMetaPath = new Path(incBootstrapDir, 
EximUtil.METADATA_PATH_NAME + "/" + sourceDbName);
+          FileStatus[] listing = fs.listStatus(tableMetaPath);
+          for (FileStatus tablePath : listing) {
+            tableList.remove(tablePath.getPath().getName());
+          }
+          tablesToDrop = tableList;

Review Comment:
   They both are different:
   tablesToBootstrap are the ones which we need to bootstrap, the tables which 
got modified during a DR scenario. They exist both at source & target cluster, 
but have diverged. We would overwrite the table on target cluster using 
bootstrap. 
   
   tablesToDrop: This contains the tables, which exists on Target cluster but 
not on source cluster. This also happens only in case of Disaster Recovery(DR) 
scenario only. So, now since the table doesn't exist on source cluster we need 
to drop that table on target cluster, so that source & target cluster can be in 
sync in terms of tables available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to