[
https://issues.apache.org/jira/browse/HBASE-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136655#comment-14136655
]
Jeffrey Zhong commented on HBASE-11906:
---------------------------------------
{quote}
One question, are we going to create another file later on when moving other
DLR info from ZK to file?
{quote}
If we have to store the related state somewhere, we can move to meta or extend
current seqid files for that purpose.
Ok. I'll check in the v1 patch by end of tomorrow after I fix small issues(such
as hbck warnings & remove zkw.sync()) mentioned above upon check in if there is
no objections. Thanks.
> Meta data loss with distributed log replay
> ------------------------------------------
>
> Key: HBASE-11906
> URL: https://issues.apache.org/jira/browse/HBASE-11906
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.99.0, 2.0.0
> Reporter: Jimmy Xiang
> Assignee: Jeffrey Zhong
> Attachments: HBASE-11906.patch, debugging.patch,
> hbase-11906-v2.patch, meta-data-loss-2.log, meta-data-loss-with-dlr.log
>
>
> In the attached log, you can see, before log replaying, the region is open on
> e1205:
> {noformat}
> A3. 2014-09-05 16:38:46,705 INFO
> [B.defaultRpcServer.handler=5,queue=2,port=20020] master.RegionStateStore:
> Updating row
> IntegrationTestBigLinkedList,\x90Jy\x04\xA7\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7.
> with
> state=OPEN&openSeqNum=40118237&server=e1205.halxg.cloudera.com,20020,1409960280431
> {noformat}
> After the log replay, we got from meta the region is open on e1209
> {noformat}
> A4. 2014-09-05 16:41:12,257 INFO [ActiveMasterManager]
> master.AssignmentManager: Loading from meta:
> {cbb0d736ebfabcf4a07e5a7b395fcdf7 state=OPEN, ts=1409960472257,
> server=e1209.halxg.cloudera.com,20020,1409959391651}
> {noformat}
> The replayed edits show the log does have the edit expected:
> {noformat}
> 2014-09-05 16:41:11,862 INFO
> [B.defaultRpcServer.handler=18,queue=0,port=20020]
> regionserver.RSRpcServices: Meta replay edit
> type=PUT,mutation={"totalColumns":4,"families":{"info":[{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"e1205.halxg.cloudera.com:20020","qualifier":"server","vlen":30},{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"\\x00\\x00\\x01HH.\\x81o","qualifier":"serverstartcode","vlen":8},{"timestamp":1409960326705,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"\\x00\\x00\\x00\\x00\\x02d'\\xDD","qualifier":"seqnumDuringOpen","vlen":8},{"timestamp":1409960326706,"tag":["3:\\x00\\x00\\x00\\x00\\x02bad"],"value":"OPEN","qualifier":"state","vlen":4}]},"row":"IntegrationTestBigLinkedList,\\x90Jy\\x04\\xA7\\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7."}
> {noformat}
> Why we picked up a wrong value with an older time stamp?
> {noformat}
> 2014-09-05 16:41:11,063 INFO
> [B.defaultRpcServer.handler=9,queue=0,port=20020] regionserver.RSRpcServices:
> Meta replay edit
> type=PUT,mutation={"totalColumns":4,"families":{"info":[{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"e1209.halxg.cloudera.com:20020","qualifier":"server","vlen":30},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"\\x00\\x00\\x01HH
>
> \\xF1\\xA3","qualifier":"serverstartcode","vlen":8},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"\\x00\\x00\\x00\\x00\\x00\\x01\\xB7\\xAB","qualifier":"seqnumDuringOpen","vlen":8},{"timestamp":1409959994634,"tag":["3:\\x00\\x00\\x00\\x00\\x00\\x00\\x09\\x99"],"value":"OPEN","qualifier":"state","vlen":4}]},"row":"IntegrationTestBigLinkedList,\\x90Jy\\x04\\xA7\\x90Jp,1409959495482.cbb0d736ebfabcf4a07e5a7b395fcdf7."}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)