[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188954#comment-13188954
 ] 

stack commented on HBASE-5179:
------------------------------

Regards v11 new splitLog method, I don't get this justification Zhihong:

{code}
"What I am thinking is that maybe we should split currentMetaServer's log in a 
non-distributed fashion because the splitting is of high priority."
{code}

Is the thought that local splitting will run faster?  Is this true?

areDeadServersInProgress method name should match the other method names so it 
should be areDeadServersBeingProcessed (minor).  Ditto these methods, 
isDeadRootServerInProgress, etc.  Whats the difference between InProgress and 
BeingProcessed?  We also seem to have active voice UnderProcessing going on.  
Should be consistent?

Do these need to be public?  Seem like only used in same package by master.

Should the zk callback be up and operating before the master comes completely 
on line?

The knownServers in HMaster, are heartbeating servers that have come in before 
the master came on line?  That seems like an important fix.

I'm now a little confused as to the scope of this patch.  The Jinchao 
descriptions above on how to reproduce pathological situations I get.  It'd be 
great to do these up as a unit tests.  I'm not sure which of Jinchao 
descriptions apply to TRUNK as opposed to 0.90.  Any chance of getting a list 
of scenarios this patch is supposed to fix?  If we had that, I'd be up for 
writing unit tests for TRUNK at least (I think it has sufficient primitives 
mocking up Jinchao descriptions w/o need of a cluster).

That said, this patch and the discussion above in this issue is uncovering 
critical stuff; thanks for all the work lads.








                
> Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
> region to be assigned before log splitting is completed, causing data loss
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5179
>                 URL: https://issues.apache.org/jira/browse/HBASE-5179
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.6
>
>         Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 
> 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 
> 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 
> 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, 
> hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to