[
https://issues.apache.org/jira/browse/HADOOP-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HADOOP-2282:
--------------------------
Description:
Looking in master for a cluster of ~90 regionservers, the regionserver carrying
the ROOT went down (because it hadn't talked to the master in 30 seconds).
Master notices the downed regionserver because its lease timesout. It then
goes to run the shutdown server sequence only splitting the regionserver's edit
log, it gets stuck trying to split the second of three log files. Eventually,
after ~5minutes, the second log split throws:
{code}
34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
operations: ProcessServerShutdown of XX.XX.XX.XX:60020
34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
/hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client
38.99.77.80 because curren t leaseholder is trying to recreate file.
34976 at
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
34977 at
org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
34980 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
34981 at java.lang.reflect.Method.invoke(Method.java:597)
34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
34984
34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
34986 at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
34987 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
34989 at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
{code}
And so on every 5 minutes.
Because the regionserver that went down had ROOT region, and because we are
stuck in this eternal loop, ROOT never gets reallocated.
was:
Looking in master for a cluster of ~90 regionservers, the regionserver carrying
the ROOT went down (because it hadn't talked to the master in 30 seconds).
Master notices the downed regionserver because its lease timesout. It then
goes to run the shutdown server sequence only splitting the regionserver's edit
log, it gets stuck trying to split the second of three log files. Eventually,
after ~5minutes, the second log split throws:
{code}
34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
operations: ProcessServerShutdown of 38.99.76.15:60020
34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
/hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client
38.99.77.80 because curren t leaseholder is trying to recreate file.
34976 at
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
34977 at
org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
34980 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
34981 at java.lang.reflect.Method.invoke(Method.java:597)
34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
34984
34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
34986 at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
34987 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
34989 at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
{code}
And so on every 5 minutes.
Because the regionserver that went down had ROOT region, and because we are
stuck in this eternal loop, ROOT never gets reallocated.
> [hbase] Stuck replay of failed regionserver edits
> -------------------------------------------------
>
> Key: HADOOP-2282
> URL: https://issues.apache.org/jira/browse/HADOOP-2282
> Project: Hadoop
> Issue Type: Bug
> Reporter: stack
> Priority: Minor
>
> Looking in master for a cluster of ~90 regionservers, the regionserver
> carrying the ROOT went down (because it hadn't talked to the master in 30
> seconds).
> Master notices the downed regionserver because its lease timesout. It then
> goes to run the shutdown server sequence only splitting the regionserver's
> edit log, it gets stuck trying to split the second of three log files.
> Eventually, after ~5minutes, the second log split throws:
> {code}
> 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
> 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client
> 38.99.77.80 because curren t leaseholder is trying to recreate file.
> 34976 at
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
> 34977 at
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
> 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
> 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> 34980 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 34981 at java.lang.reflect.Method.invoke(Method.java:597)
> 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 34984
> 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> 34986 at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 34987 at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 34989 at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
> 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> {code}
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are
> stuck in this eternal loop, ROOT never gets reallocated.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.