[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

chunhui shen (Commented) (JIRA) Tue, 24 Jan 2012 19:05:25 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192828#comment-13192828
 ]


chunhui shen commented on HBASE-5270:
-------------------------------------

Thanks for stack's comment
{code}This method param names are not right 'definitiveRootServer'; what is 
meant by definitive?  Do they need this qualifier?
{code}
When master is initializing, carryingMeta and carryingRoot is always false (We 
could see it from the code of ServerManger#expire()). So, the param 
'definitiveRootServer' is used in this case to ensure the dead root server is 
carryingRoot when it is being expired.


{code}Is there anything in place to stop us expiring a server twice if its 
carrying root and meta?
{code}
Is there any possible to expire a server if its carrying root and meta now? I 
don't think so.

{code}onlineServers needs to be explained{code}
I think so, this param is not be passed into joinCluster before, so when 
executing joinCluster , the onlineServers  is current online servers. However, 
it has a problem before, when executing joinCluster , maybe some server is 
being processed as dead server, so its log is being splitted by SSH while its 
regions are being assigned by joinCluster(),causing data loss.

{code}It looks like we get the list by trawling zk for regionserver znodes that 
have not checked in.  Don't we do this operation earlier in master setup?  Are 
we doing it again here?
{code}
I don't find this operation earlier in master setup, and this operation is not 
introduced by this issue. And I only introduce this logic for 90 from trunk.

{code}Though distributed split log is configured, we will do in master single 
process splitting under some conditions with this patch.  Its not explained in 
code why we would do this.{code}
I think we need explain it, But whether we shouldn't use distributed split log, 
I'm not very sure.

{code}Why would we have dead servers in progress here in master startup?  
Because a servershutdownhandler fired?
{code}
When matser is initializing, if one RS is killed and  restart, then dead server 
is in progress while master startup
                
> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5270
>                 URL: https://issues.apache.org/jira/browse/HBASE-5270
>             Project: HBase
>          Issue Type: Sub-task
>          Components: master
>            Reporter: Zhihong Yu
>             Fix For: 0.94.0, 0.92.1
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

Reply via email to