saintstack commented on a change in pull request #3417:
URL: https://github.com/apache/hbase/pull/3417#discussion_r660162631



##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##########
@@ -1047,6 +1074,29 @@ private void 
finishActiveMasterInitialization(MonitoredTask status)
     // Set master as 'initialized'.
     setInitialized(true);
 
+    if (tableFamilyDesc == null && replBarrierFamilyDesc == null) {
+      // create missing CFs in meta table after master is set to 'initialized'.
+      createMissingCFsInMetaDuringUpgrade(metaDescriptor);
+
+      // Throwing this Exception to abort active master is painful but this
+      // seems the only way to add missing CFs in meta while upgrading from
+      // HBase 1 to 2 (where HBase 2 has HBASE-23055 & HBASE-23782 checked-in).
+      // So, why do we abort active master after adding missing CFs in meta?
+      // When we reach here, we would have already bypassed 
NoSuchColumnFamilyException
+      // in initClusterSchemaService(), meaning ClusterSchemaService is not
+      // correctly initialized but we bypassed it. Similarly, we bypassed
+      // tableStateManager.start() as well. Hence, we should better abort
+      // current active master because our main task - adding missing CFs
+      // in meta table is done (possible only after master state is set as
+      // initialized) at the expense of bypassing few important tasks as part
+      // of active master init routine. So now we abort active master so that
+      // next active master init will not face any issues and all mandatory
+      // services will be started during master init phase.

Review comment:
       Nice note.

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##########
@@ -1047,6 +1074,29 @@ private void 
finishActiveMasterInitialization(MonitoredTask status)
     // Set master as 'initialized'.
     setInitialized(true);
 
+    if (tableFamilyDesc == null && replBarrierFamilyDesc == null) {
+      // create missing CFs in meta table after master is set to 'initialized'.
+      createMissingCFsInMetaDuringUpgrade(metaDescriptor);
+
+      // Throwing this Exception to abort active master is painful but this
+      // seems the only way to add missing CFs in meta while upgrading from
+      // HBase 1 to 2 (where HBase 2 has HBASE-23055 & HBASE-23782 checked-in).
+      // So, why do we abort active master after adding missing CFs in meta?
+      // When we reach here, we would have already bypassed 
NoSuchColumnFamilyException
+      // in initClusterSchemaService(), meaning ClusterSchemaService is not
+      // correctly initialized but we bypassed it. Similarly, we bypassed
+      // tableStateManager.start() as well. Hence, we should better abort
+      // current active master because our main task - adding missing CFs
+      // in meta table is done (possible only after master state is set as
+      // initialized) at the expense of bypassing few important tasks as part
+      // of active master init routine. So now we abort active master so that
+      // next active master init will not face any issues and all mandatory
+      // services will be started during master init phase.
+      throw new IOException("Stopping active master after missing CFs are "

Review comment:
       We are not stopping, we are aborting?
   
   Would it be better to throw a PleaseRestartMeException here? (Would have to 
create it...).

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##########
@@ -953,9 +955,25 @@ private void 
finishActiveMasterInitialization(MonitoredTask status)
     if (!waitForMetaOnline()) {
       return;
     }
+    TableDescriptor metaDescriptor = tableDescriptors.get(
+        TableName.META_TABLE_NAME);
+    final ColumnFamilyDescriptor tableFamilyDesc = metaDescriptor
+        .getColumnFamily(HConstants.TABLE_FAMILY);
+    final ColumnFamilyDescriptor replBarrierFamilyDesc =
+        metaDescriptor.getColumnFamily(HConstants.REPLICATION_BARRIER_FAMILY);
+
     this.assignmentManager.joinCluster();
     // The below depends on hbase:meta being online.
-    this.tableStateManager.start();
+    try {
+      this.tableStateManager.start();
+    } catch (NoSuchColumnFamilyException e) {
+      if (tableFamilyDesc == null && replBarrierFamilyDesc == null) {
+        LOG.info("For missing CFs in meta, this Exception is expected", e);

Review comment:
       You think operator will know what 'missing CFs in meta' is about? Would 
it be better to talk about migration from hbase1 to hbase2?

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##########
@@ -1106,6 +1156,38 @@ private void 
finishActiveMasterInitialization(MonitoredTask status)
     }
   }
 
+  private void createMissingCFsInMetaDuringUpgrade(
+      TableDescriptor metaDescriptor) throws IOException {
+    TableDescriptor newMetaDesc =
+        TableDescriptorBuilder.newBuilder(metaDescriptor)
+            .setColumnFamily(FSTableDescriptors.getTableFamilyDesc(conf))
+            .setColumnFamily(FSTableDescriptors.getReplBarrierFamilyDesc())
+            .build();
+    long pid = this.modifyTable(TableName.META_TABLE_NAME, () -> newMetaDesc,
+        0, 0, false);
+    int tries = 30;
+    while (!(getMasterProcedureExecutor().isFinished(pid))
+        && getMasterProcedureExecutor().isRunning() && tries > 0) {
+      try {
+        Thread.sleep(1000);
+      } catch (InterruptedException e) {
+        throw new IOException("Wait interrupted", e);
+      }
+      tries--;
+    }
+    if (tries <= 0) {
+      throw new IOException(

Review comment:
       Rather than new IOE... throw a new HBaseIOE?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to