[
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156965#comment-13156965
]
chunhui shen commented on HBASE-4862:
-------------------------------------
@Ted Yu @Todd Lipcon
It will happen concurrently in the following case:
1.Move region from server A to server B (for example,do balance)
2.kill server A and Server B
3.restart server A and Server B immediately
Before we restart server A and Server B, log data about this region appear in
the both server's log file,
4.After we restart server B, serverShutdownHandler process this dead server ,
and assign this region,
5.At the same time, serverShutdownHandler would process dead server B, and
split server B's hlog
because 4 and 5 is concurrent, replayRecoveredEditsIfAny in 4 and appending log
entry for this region's
recoverd.edit file are concurrent. So, when the recoverd.edit file deleted by
replayRecoveredEdits, exception is thrown.
master and region server log in this case as the following:
master log:
2011-11-16 11:50:13,037 FATAL
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while
writing log entry to log
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
File does not exist. [Lease. Holder:
DFSClient_hb_m_dw75.kgb.sqa.cm4:60000_1321413286871, pendingcreates: 54]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:49)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:962)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:926)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:898)
regionserver log:
2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion:
Failed delete of
hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156791680
2011-11-16 11:49:49,732 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Deleted recovered.edits
file=hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/0000000013156800103
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.2
> Reporter: chunhui shen
> Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process
> replayRecoveredEditsIfAny() ,it will delete the file region
> A/recoverd.edits/123456
> 3.Split hlog thread catches the io exception, and stop parse this log file
> and if skipError = true , add it to the corrupt logs....However, data in
> other regions in this log file will loss
> 4.Or if skipError = false, it will check filesystem.Of course, the file
> system is ok , and it only prints a error log, continue assigning regions.
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting recover.edits
> file
> which is appending by split hlog thread
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira