[
https://issues.apache.org/jira/browse/HBASE-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127486#comment-14127486
]
Jeffrey Zhong commented on HBASE-11906:
---------------------------------------
{quote}
I was wondering why the seq number decreased. Is it because when the meta table
is open again, it doesn't know the latest seq number? If so, if all edits in
the log are replayed in order, the seq number should not decrease, right?
{quote}
You're right. The new SeqId will be always increasing. In replay case, we also
bump up the value when opening a region. It seems that we open the meta region
with wrong seqId value. You can confirm that by checking the following:
{code} LOG.info("Onlined " + this.getRegionInfo().getShortNameToLog() +
"; next sequenceid=" + nextSeqid);{code}
Let me try your patch to see if I can find some info. Thanks.
> Meta data loss with distributed log replay
> ------------------------------------------
>
> Key: HBASE-11906
> URL: https://issues.apache.org/jira/browse/HBASE-11906
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Jimmy Xiang
> Attachments: debugging.patch, meta-data-loss-2.log,
> meta-data-loss-with-dlr.log
>
>
> In the attached log, you can see, before log replaying, the region is open on
> e1205:
> {noformat}
> A3. 2014-09-05 16:38:46,705 INFO
> [B.defaultRpcServer.handler=5,queue=2,port=20020] master.RegionStateStore:
> Updating row
> IntegrationTestBigLinkedList,\x90Jy\x04\xA7\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7.
> with
> state=OPEN&openSeqNum=40118237&server=e1205.halxg.cloudera.com,20020,1409960280431
> {noformat}
> After the log replay, we got from meta the region is open on e1209
> {noformat}
> A4. 2014-09-05 16:41:12,257 INFO [ActiveMasterManager]
> master.AssignmentManager: Loading from meta:
> {cbb0d736ebfabcf4a07e5a7b395fcdf7 state=OPEN, ts=1409960472257,
> server=e1209.halxg.cloudera.com,20020,1409959391651}
> {noformat}
> The replayed edits show the log does have the edit expected:
> {noformat}
> 2014-09-05 16:41:11,862 INFO
> [B.defaultRpcServer.handler=18,queue=0,port=20020]
> regionserver.RSRpcServices: Meta replay edit
> type=PUT,mutation={"totalColumns":4,"families":{"info":[{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"e1205.halxg.cloudera.com:20020","qualifier":"server","vlen":30},{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"\\x00\\x00\\x01HH.\\x81o","qualifier":"serverstartcode","vlen":8},{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"\\x00\\x00\\x00\\x00\\x02d'\\xDD","qualifier":"seqnumDuringOpen","vlen":8},{"timestamp":1409960326706,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"OPEN","qualifier":"state","vlen":4}]},"row":"IntegrationTestBigLinkedList,\\x90Jy\\x04\\xA7\\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7."}
> {noformat}
> Why we picked up a wrong value with an older time stamp?
> {noformat}
> 2014-09-05 16:41:11,063 INFO
> [B.defaultRpcServer.handler=9,queue=0,port=20020] regionserver.RSRpcServices:
> Meta replay edit
> type=PUT,mutation={"totalColumns":4,"families":{"info":[{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"e1209.halxg.cloudera.com:20020","qualifier":"server","vlen":30},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"\\x00\\x00\\x01HH
>
> \\xF1\\xA3","qualifier":"serverstartcode","vlen":8},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"\\x00\\x00\\x00\\x00\\x00\\x01\\xB7\\xAB","qualifier":"seqnumDuringOpen","vlen":8},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"OPEN","qualifier":"state","vlen":4}]},"row":"IntegrationTestBigLinkedList,\\x90Jy\\x04\\xA7\\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7."}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)