[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549767 ]
stack commented on HADOOP-2283: ------------------------------- Today over on pauls' machines I see this around a new failure to rerun edits: {code} 2007-12-08 00:03:47,981 DEBUG hbase.HLog - Splitting 0 of 11: hdfs://img671:9000/hbase/log_XX.XX.XX.101_1197080909551_60020/hlog.dat.810 2007-12-08 00:03:48,243 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_666176028/oldlogfile.log 2007-12-08 00:03:48,249 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_1820335982/oldlogfile.log 2007-12-08 00:03:48,255 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_-1253552729/oldlogfile.log .... 2007-12-08 00:03:49,599 DEBUG hbase.HLog - Applied 5500 edits 2007-12-08 00:03:49,612 DEBUG hbase.HLog - Applied 5600 edits 2007-12-08 00:03:49,627 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_1117809784/oldlogfile.log 2007-12-08 00:03:49,646 DEBUG hbase.HLog - Applied 5700 edits 2007-12-08 00:03:49,660 DEBUG hbase.HLog - Applied 5800 edits 2007-12-08 00:03:49,673 DEBUG hbase.HLog - Applied 5900 edits .... 2007-12-08 00:03:52,035 DEBUG hbase.HLog - Applied 30000 edits 2007-12-08 00:03:52,036 DEBUG hbase.HLog - Applied 30006 total edits 2007-12-08 00:03:52,036 DEBUG hbase.HLog - Splitting 1 of 11: hdfs://img671:9000/hbase/log_XX.XX.XX.101_1197080909551_60020/hlog.dat.811 2007-12-08 00:03:52,064 DEBUG hbase.HLog - Creating new log file writer for path /hbase/hregion_1117809784/oldlogfile.log 2007-12-08 00:04:34,397 INFO hbase.HMaster - HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: XX.XX.XX.103:60020} {code} and then eventually.... {code} 2007-12-08 00:04:52,069 DEBUG retry.RetryInvocationHandler - Exception while invoking create of class org.apache.hadoop.dfs.$Proxy0. Retrying.org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /hbase/hregion_1117809784/oldlogfile.log for DFSClient_863222988 on client XX.XX.XX.248 because current leaseholder is trying to recreate file. at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:861) at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:817) at org.apache.hadoop.dfs.NameNode.create(NameNode.java:272) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:644) at org.apache.hadoop.ipc.Client.call(Client.java:507) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:1424) at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:354) at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:122) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:390) {code} There's a double attempt at creating the file hregion_1117809784/oldlogfile.log. Should be easy to fix. > [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed > regionserver edits) > ------------------------------------------------------------------------------------- > > Key: HADOOP-2283 > URL: https://issues.apache.org/jira/browse/HADOOP-2283 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: stack > Assignee: stack > Priority: Minor > Fix For: 0.16.0 > > Attachments: compaction.patch, OP_READ.patch > > > Looking in master for a cluster of ~90 regionservers, the regionserver > carrying the ROOT went down (because it hadn't talked to the master in 30 > seconds). > Master notices the downed regionserver because its lease timesout. It then > goes to run the shutdown server sequence only splitting the regionserver's > edit log, it gets stuck trying to split the second of three log files. > Eventually, after ~5minutes, the second log split throws: > 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending > operations: ProcessServerShutdown of XX.XX.XX.XX:60020 > 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client > XX.XX.XX.XX because curren t leaseholder is trying to recreate file. > 34976 at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) > 34977 at > org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) > 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) > 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > 34980 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > 34981 at java.lang.reflect.Method.invoke(Method.java:597) > 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 34984 > 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > 34986 at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > 34987 at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > 34989 at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) > 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) > And so on every 5 minutes. > Because the regionserver that went down had ROOT region, and because we are > stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.