[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

stack (Commented) (JIRA) Mon, 27 Feb 2012 10:23:47 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217347#comment-13217347
 ]


stack commented on HBASE-5270:
------------------------------

Do you think we should check to see if we have already split this server's log 
for the case where the server was carrying root and meta?

{code}
+      splitLogIfOnline(currentMetaServer);
{code}

Or will the above call become a noop because we just split it before we 
assignedroot?

Is this a 'safe mode' or is it the master 'initializing'?  I think 'safe mode' 
makes folks think of hdfs.  It is a little similar in that master is trying to 
make sense of the cluster but initializing might be a better name for this 
state.

BTW, I think this is an improvement over previous versions of this patch.  Its 
easier to reason about.  Good stuff Chunhui.

Make a method and put this duplicated code into it and call it from the two 
places its repeated:

{code}
+    if (!deadNotExpiredServers.isEmpty()) {
+      for (final ServerName server : deadNotExpiredServers) {
+        LOG.debug("Removing dead but not expired server: " + server
+            + " from eligible server pool.");
+        servers.remove(server);
+      }
+    }
{code}

Fix this bit of javadoc '... but not are expired now.'

You don't need this:

{code}
+ * Copyright 2007 The Apache Software Foundation
{code}

I think MasterInSafeModeException becomes MasterInitializingException?

Good stuff Chunhui

Regards Jimmy's comment:

bq. Instead of introducing safe mode, can we add something to the RPC server 
and don't allow it to sever traffic before the actual server is ready, for 
example, fully initialized?

We have a ServerNotRunningYetException down in the ipc.  Its thrown by 
HBaseServer if RPC has not started yet.  It seems a little related to this 
MasterInitializing.  We also have a PleaseHoldException.  Perhaps the Master 
should throw this instead of the MasterInitializing?  We'd throw a 
PleaseHoldException and the message would be detail that the master is 
initializing?

                
> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5270
>                 URL: https://issues.apache.org/jira/browse/HBASE-5270
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master
>            Reporter: Zhihong Yu
>            Assignee: chunhui shen
>             Fix For: 0.92.1, 0.94.0
>
>         Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

Reply via email to