[ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549767
 ] 

stack commented on HADOOP-2283:
-------------------------------

Today over on pauls' machines I see this around a new failure to rerun edits:

{code}
2007-12-08 00:03:47,981 DEBUG hbase.HLog - Splitting 0 of 11: 
hdfs://img671:9000/hbase/log_XX.XX.XX.101_1197080909551_60020/hlog.dat.810
2007-12-08 00:03:48,243 DEBUG hbase.HLog - Creating new log file writer for 
path /hbase/hregion_666176028/oldlogfile.log
2007-12-08 00:03:48,249 DEBUG hbase.HLog - Creating new log file writer for 
path /hbase/hregion_1820335982/oldlogfile.log
2007-12-08 00:03:48,255 DEBUG hbase.HLog - Creating new log file writer for 
path /hbase/hregion_-1253552729/oldlogfile.log
....
2007-12-08 00:03:49,599 DEBUG hbase.HLog - Applied 5500 edits
2007-12-08 00:03:49,612 DEBUG hbase.HLog - Applied 5600 edits
2007-12-08 00:03:49,627 DEBUG hbase.HLog - Creating new log file writer for 
path /hbase/hregion_1117809784/oldlogfile.log
2007-12-08 00:03:49,646 DEBUG hbase.HLog - Applied 5700 edits
2007-12-08 00:03:49,660 DEBUG hbase.HLog - Applied 5800 edits
2007-12-08 00:03:49,673 DEBUG hbase.HLog - Applied 5900 edits
....
2007-12-08 00:03:52,035 DEBUG hbase.HLog - Applied 30000 edits
2007-12-08 00:03:52,036 DEBUG hbase.HLog - Applied 30006 total edits
2007-12-08 00:03:52,036 DEBUG hbase.HLog - Splitting 1 of 11: 
hdfs://img671:9000/hbase/log_XX.XX.XX.101_1197080909551_60020/hlog.dat.811
2007-12-08 00:03:52,064 DEBUG hbase.HLog - Creating new log file writer for 
path /hbase/hregion_1117809784/oldlogfile.log
2007-12-08 00:04:34,397 INFO  hbase.HMaster - HMaster.rootScanner scanning meta 
region regionname: -ROOT-,,0, startKey: <>, server: XX.XX.XX.103:60020}
{code}

and then eventually....

{code}
2007-12-08 00:04:52,069 DEBUG retry.RetryInvocationHandler - Exception while 
invoking create of class org.apache.hadoop.dfs.$Proxy0. 
Retrying.org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
/hbase/hregion_1117809784/oldlogfile.log for DFSClient_863222988 on client 
XX.XX.XX.248 because current leaseholder is trying to recreate file.    at 
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:861)    
at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:817)    at 
org.apache.hadoop.dfs.NameNode.create(NameNode.java:272)    at 
sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)    at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389)    at 
org.apache.hadoop.ipc.Server$Handler.run(Server.java:644)    at 
org.apache.hadoop.ipc.Client.call(Client.java:507)    at 
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186)
    at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)    at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)    at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:1424)
    at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:354)    at 
org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:122)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:390)

{code}


There's a double attempt at creating the file 
hregion_1117809784/oldlogfile.log.  Should be easy to fix.

> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
> regionserver edits)
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2283
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2283
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver 
> carrying the ROOT went down (because it hadn't talked to the master in 30 
> seconds).
> Master notices the downed regionserver because its lease timesout. It then 
> goes to run the shutdown server sequence only splitting the regionserver's 
> edit log, it gets stuck trying to split the second of three log files. 
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
> XX.XX.XX.XX because curren        t leaseholder is trying to recreate file.
>   34976     at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977     at 
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978     at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979     at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981     at java.lang.reflect.Method.invoke(Method.java:597)
>   34982     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>   34986     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989     at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990     at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are 
> stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to