[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156991#comment-13156991 ]
chunhui shen commented on HBASE-4862: ------------------------------------- @Ted @Todd I'm sorry my explanation is not clear. I think I should descibe the detailed case first. In the whole following process , client's putting data to region C. 1.Sucessfully move region C from server A to server B, At the moment,there is log entry about region C in both server A's log file and server B's log file 2.kill server A and server B, 3.restart server B, Now, mastet start serverShutdownHanlder for server B, and assign the region C to server D 4,Before region C is opend on the server D,restart server A Now,mastet start serverShutdownHanlder for server A, and split server A's log file. Because there is log entry about region C in server A's log file (why? see 1), split hlog thread would create a file F in the region C's recovered.edits directory. 5.In region C opening process, it will execute replayRecoveredEdits(),and then delete file F. 6.Therefore,in the 4, it throws IO Exception that file F not exists, and cause stopping parse the current server A's hlog file, however, other data in this server A's hlog file lossed The posted region server log is server B's log, and it is doing replayRecoveredEditsIfAny(). Although it prints failed delete of file recovered.edits/0000000013156791680, but in fact this file has been deleted, and master throws file not exist exception : 2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680 File does not exist. I'm not sure whether you are clear now, waiting for your question. Thanks! > Splitting hlog and opening region concurrently may cause data loss > ------------------------------------------------------------------ > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.2 > Reporter: chunhui shen > Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logs....However, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira