[ 
https://issues.apache.org/jira/browse/HBASE-21444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679097#comment-16679097
 ] 

Josh Elser commented on HBASE-21444:
------------------------------------

Thinking some more..

{code}
+      RegionState rs = this.assignmentManager.getRegionStates()
+          .getRegionState(RegionInfoBuilder.FIRST_META_REGIONINFO);
+      if (rs.isOpened()) {
+        if (this.getServerManager().isServerOnline(rs.getServerName())) {
+          return true;
+        } else {
+          // As the meta is OPENED but the server is not online , it means 
there can be an SCP if it
+          // is crashed which will be transitioning meta
+          Optional<ServerCrashProcedure> optSCP = 
this.procedureExecutor.getProcedures().stream()
+              .filter(p -> p instanceof ServerCrashProcedure).map(m -> 
((ServerCrashProcedure) m))
+              .filter(
+                scp -> (scp.hasMetaTableRegion() && 
scp.getServerName().equals(rs.getServerName())))
+              .findAny();
+          LOG.warn(
+            "{} is NOT online; state={}; ServerCrashProcedures={}. Master 
startup cannot "
+                + "progress, in holding-pattern until region onlined.",
+            RegionInfoBuilder.FIRST_META_REGIONINFO.getRegionNameAsString(), 
rs,
+            optSCP.isPresent());
+          // we have not found the SCP for the server and the server is also 
not online yet , it
+          // is better to expire it
+          if (!optSCP.isPresent()) {
+            if (!this.getServerManager().isServerOnline(rs.getServerName())
+                && !this.getServerManager().getOnlineServers().isEmpty()) {
+              // Scheduling SCP for the server if there are any live server 
available
+              getAssignmentManager().submitServerCrash(rs.getServerName(), 
true);
+            }
+          }
+        }
+      }
{code}

This is trying to detect the state: we think meta is OPEN but the server it's 
listed on is dead and we have no SCP for that dead server.

I think what we really want to detect is when meta is _anything_ but OPEN and 
we don't have something queued to fix it. Regardless of how we get there, 
wouldn't it be good for us to see "hey, meta is OFFLINE, shouldn't we try to 
open it?". I just hate the idea that we don't do everything in our power to 
keep meta OPEN.

Glancing through HBASE-21035, I'm not sure if [~Apache9] would agree with me in 
this general case, or if his objection was more around "how would we lose all 
procedures 'normally'?" :)

> Recover meta in case of long ago dead region server appear in meta znode
> ------------------------------------------------------------------------
>
>                 Key: HBASE-21444
>                 URL: https://issues.apache.org/jira/browse/HBASE-21444
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.2
>            Reporter: Ankit Singhal
>            Assignee: Ankit Singhal
>            Priority: Major
>         Attachments: HBASE-21444.branch-2.0.001.patch, 
> HBASE-21444.branch-2.0.002.patch
>
>
> Ambari metric server uses HBase as storage and currently have different 
> znodes (/hbase-unsecure and /hbase-secure) to differentiate secure/unsecure 
> deployment of HBase.  
> As it also supports the rollback of the cluster from kerberised to 
> non-kerberised (includes step of changing znode from /hbase-secure to 
> /hbase-unsecure) , but with HBase 2.0 , meta-region-server znode from old 
> zookeeper znodes will have regionserver which was long ago gone and there 
> will be no procedure to transition it, resulting it to get stuck for lifetime.
> One option is to clear the znodes before rollingback but as it used to work 
> with prior releases due to RecoverMetaProcedure, the ask is if we can fix 
> meta assignment in case the wrong state is available in znode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to