[jira] [Commented] (HBASE-11906) Meta data loss with distributed log replay

stack (JIRA) Thu, 11 Sep 2014 09:23:01 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130231#comment-14130231
 ]


stack commented on HBASE-11906:
-------------------------------

Why the remove of the sync?

Putting into meta would be more 'coherent'.  We will have the interesting 
issues around meta being not available on occasion -- a crash of meta at about 
the same time -- and we'll need to give up if the server is supposed to be 
shutting down but it'd be tidier, no?  We write a seqid to the master memory 
now so we can skip edits during log splitting.  Could we use that mechanism?  
Let master hold the seqid during recovery or must the id persist?

Good stuff lads (figuring and fixing)

> Meta data loss with distributed log replay
> ------------------------------------------
>
>                 Key: HBASE-11906
>                 URL: https://issues.apache.org/jira/browse/HBASE-11906
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.99.0, 2.0.0
>            Reporter: Jimmy Xiang
>            Assignee: Jeffrey Zhong
>         Attachments: HBASE-11906.patch, debugging.patch, 
> meta-data-loss-2.log, meta-data-loss-with-dlr.log
>
>
> In the attached log, you can see, before log replaying, the region is open on 
> e1205:
> {noformat}
> A3. 2014-09-05 16:38:46,705 INFO  
> [B.defaultRpcServer.handler=5,queue=2,port=20020] master.RegionStateStore: 
> Updating row 
> IntegrationTestBigLinkedList,\x90Jy\x04\xA7\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7.
>  with 
> state=OPEN&openSeqNum=40118237&server=e1205.halxg.cloudera.com,20020,1409960280431
> {noformat}
> After the log replay, we got from meta the region is open on e1209
> {noformat}
> A4. 2014-09-05 16:41:12,257 INFO  [ActiveMasterManager] 
> master.AssignmentManager: Loading from meta: 
> {cbb0d736ebfabcf4a07e5a7b395fcdf7 state=OPEN, ts=1409960472257, 
> server=e1209.halxg.cloudera.com,20020,1409959391651}
> {noformat}
> The replayed edits show the log does have the edit expected:
> {noformat}
> 2014-09-05 16:41:11,862 INFO  
> [B.defaultRpcServer.handler=18,queue=0,port=20020] 
> regionserver.RSRpcServices: Meta replay edit 
> type=PUT,mutation={"totalColumns":4,"families":{"info":[{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"e1205.halxg.cloudera.com:20020","qualifier":"server","vlen":30},{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"\\x00\\x00\\x01HH.\\x81o","qualifier":"serverstartcode","vlen":8},{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"\\x00\\x00\\x00\\x00\\x02d'\\xDD","qualifier":"seqnumDuringOpen","vlen":8},{"timestamp":1409960326706,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"OPEN","qualifier":"state","vlen":4}]},"row":"IntegrationTestBigLinkedList,\\x90Jy\\x04\\xA7\\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7."}
> {noformat}
> Why we picked up a wrong value with an older time stamp?
> {noformat}
> 2014-09-05 16:41:11,063 INFO  
> [B.defaultRpcServer.handler=9,queue=0,port=20020] regionserver.RSRpcServices: 
> Meta replay edit 
> type=PUT,mutation={"totalColumns":4,"families":{"info":[{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"e1209.halxg.cloudera.com:20020","qualifier":"server","vlen":30},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"\\x00\\x00\\x01HH
>  
> \\xF1\\xA3","qualifier":"serverstartcode","vlen":8},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"\\x00\\x00\\x00\\x00\\x00\\x01\\xB7\\xAB","qualifier":"seqnumDuringOpen","vlen":8},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"OPEN","qualifier":"state","vlen":4}]},"row":"IntegrationTestBigLinkedList,\\x90Jy\\x04\\xA7\\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7."}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11906) Meta data loss with distributed log replay

Reply via email to