taklwu commented on a change in pull request #2113:
URL: https://github.com/apache/hbase/pull/2113#discussion_r462667454



##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/InitMetaProcedure.java
##########
@@ -71,7 +71,11 @@ private static void writeFsLayout(Path rootDir, 
Configuration conf) throws IOExc
     LOG.info("BOOTSTRAP: creating hbase:meta region");
     FileSystem fs = rootDir.getFileSystem(conf);
     Path tableDir = CommonFSUtils.getTableDir(rootDir, 
TableName.META_TABLE_NAME);
-    if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
+    boolean removeMeta = conf.getBoolean(HConstants.REMOVE_META_ON_RESTART,

Review comment:
       sorry for late and I have reread the code and come up the following.
   
   First of all, the `partial meta` in current logic should mean a Procedure 
WAL of `InitMetaProcedure` did not succeed and `INIT_META_ASSIGN_META` was not 
completed. Currently, even if meta table can be read and a Table Descriptor can 
be retrieved but not assigned, it is still considered to be partial (correct me 
if I'm wrong). So, in short, partial meta table cannot be defined by reading 
the tableinfo or storefile itself. 
   
   Further, a combination of looking at WALs, Procedure WALs and Zookeeper data 
are the requirement and are used to define `partial meta` in the normal cases. 
But for the cloud use case, or other use cases that one of the requirements is 
missing, we will need a different discussion. For example. 
   
   1. partial meta on the HDFS long running cluster cases
      a. if have WALs and have ZK, it will be able to reassign normally
      b. if have WALs but no ZK, it will not submit a new/enter into any state 
of `InitMetaProcedure` because it found the old `InitMetaProcedure` in the WAL. 
then the old server was handled by submit any SCP and assignment manager is do 
nothing. such Master hangs and does not finish initialization. (this is a 
different problem from the cloud case)
      c. if no WALs but have ZK, `state=OPEN` remains for `hbase:meta` when 
opening an existing meta region, `InitMetaProcedure` will not be 
submitted/entered as well (see this section in 
[`HMaster`](https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1051-L1060)).
 master will hang and does not finish initialization. (this is a different 
problem from the cloud case)
   
   There, for this PR, if we only focus on the cloud use cases, the unknown 
servers and `partial meta` will be much simpler. e.g. to when running 
`InitMetaProcedure`, clusterID in zookeeper (suggested by Anoop) can be used to 
indicate if it's partial meta that indicates ZK data is fresh, Region WALs and 
procedure WAL of `InitMetaProcedure` may not be exist. And if WAL and procedure 
WAL exits, it fails into the same failures as mentioned above case 1b (out of 
scope for this PR). 
   
   2. partial meta on Cloud without WALs and ZK
     a. if we're in `INIT_META_WRITE_FS_LAYOUT` and continue, then ZK should 
have existed when master restarts. Otherwise for the case of have WALs and no 
ZK, we will fail back to case 1b and we don't handle it within this PR.
     b. if no WAL and no ZK, it submits a `InitMetaProcedure` but the procedure 
lands with `INIT_META_WRITE_FS_LAYOUT`
      * during `INIT_META_WRITE_FS_LAYOUT`, we check if ZK does not exist and 
with an existing meta directory, we should trust it and try to open it.
         * we're running this state of `INIT_META_WRITE_FS_LAYOUT` only when ZK 
does not exist or `INIT_META_WRITE_FS_LAYOUT` didn't finish previously.
   
   So, we're fixing case 2b in this PR, and I have come up the 
[prototype](https://github.com/taklwu/hbase/commit/a25e72e811740127c306e70137c1ff4457cc34c7)
 and unit tests are running off this PR now 
(`TestClusterRestartFailoverSplitWithoutZk` is falling even without our changes 
on branch-2). 
   
   The proposed changes are
   * Only perform regions reassignment for regions on unknown server when there 
is no PE WALs, no Region WALs and no ZK data
   * Do not recreate meta table directory if the restarted procedure of 
`InitMetaProcedure#INIT_META_WRITE_FS_LAYOUT` comes with no ZK data (or maybe 
no WAL as well). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to