[
https://issues.apache.org/jira/browse/HBASE-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127454#comment-14127454
]
Jeffrey Zhong commented on HBASE-11906:
---------------------------------------
[~jxiang] Where your second log line is printed? I'm assuming your first log
line "Log replay mvccNum = (seq id)40002911" is from the following line in
HRegion#doMiniBatchMutation
{code}
mvccNum = batchOp.getReplaySequenceId();
{code}
The basic idea is that we use original SeqId for replaying while the replayed
edit will be assigned a new SeqId and is written with the new seqId along with
original seqId stored in HLogKey#origLogSeqNum into replaying RS's WAL. During
replay, region read point is bigger than current replay edit's mvcc value.
Since we don't allow read during recovery, so it is fine here.
If you open a scanner before recovery process(with smaller read point), the
scan will only see updates till its read point not the newest.
> Meta data loss with distributed log replay
> ------------------------------------------
>
> Key: HBASE-11906
> URL: https://issues.apache.org/jira/browse/HBASE-11906
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Jimmy Xiang
> Attachments: meta-data-loss-2.log, meta-data-loss-with-dlr.log
>
>
> In the attached log, you can see, before log replaying, the region is open on
> e1205:
> {noformat}
> A3. 2014-09-05 16:38:46,705 INFO
> [B.defaultRpcServer.handler=5,queue=2,port=20020] master.RegionStateStore:
> Updating row
> IntegrationTestBigLinkedList,\x90Jy\x04\xA7\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7.
> with
> state=OPEN&openSeqNum=40118237&server=e1205.halxg.cloudera.com,20020,1409960280431
> {noformat}
> After the log replay, we got from meta the region is open on e1209
> {noformat}
> A4. 2014-09-05 16:41:12,257 INFO [ActiveMasterManager]
> master.AssignmentManager: Loading from meta:
> {cbb0d736ebfabcf4a07e5a7b395fcdf7 state=OPEN, ts=1409960472257,
> server=e1209.halxg.cloudera.com,20020,1409959391651}
> {noformat}
> The replayed edits show the log does have the edit expected:
> {noformat}
> 2014-09-05 16:41:11,862 INFO
> [B.defaultRpcServer.handler=18,queue=0,port=20020]
> regionserver.RSRpcServices: Meta replay edit
> type=PUT,mutation={"totalColumns":4,"families":{"info":[{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"e1205.halxg.cloudera.com:20020","qualifier":"server","vlen":30},{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"\\x00\\x00\\x01HH.\\x81o","qualifier":"serverstartcode","vlen":8},{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"\\x00\\x00\\x00\\x00\\x02d'\\xDD","qualifier":"seqnumDuringOpen","vlen":8},{"timestamp":1409960326706,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"OPEN","qualifier":"state","vlen":4}]},"row":"IntegrationTestBigLinkedList,\\x90Jy\\x04\\xA7\\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7."}
> {noformat}
> Why we picked up a wrong value with an older time stamp?
> {noformat}
> 2014-09-05 16:41:11,063 INFO
> [B.defaultRpcServer.handler=9,queue=0,port=20020] regionserver.RSRpcServices:
> Meta replay edit
> type=PUT,mutation={"totalColumns":4,"families":{"info":[{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"e1209.halxg.cloudera.com:20020","qualifier":"server","vlen":30},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"\\x00\\x00\\x01HH
>
> \\xF1\\xA3","qualifier":"serverstartcode","vlen":8},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"\\x00\\x00\\x00\\x00\\x00\\x01\\xB7\\xAB","qualifier":"seqnumDuringOpen","vlen":8},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"OPEN","qualifier":"state","vlen":4}]},"row":"IntegrationTestBigLinkedList,\\x90Jy\\x04\\xA7\\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7."}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)