I find the "ulimit nofile" of one node of my cluster is not enlarged. May my issue is cause by it. I will retest. Thank you very much. and thank J-D very much.
Refer to: item 6 of http://wiki.apache.org/hadoop/Hbase/FAQ On Fri, Mar 13, 2009 at 6:09 PM, schubert zhang <[email protected]> wrote: > This time, I have another region missed, and I use close_region > 'REGIONNAME' to close it. but then all regions after this one missed on the > web GUI, but I can find them when scan '.META.':-( notes: This case, > there is no log infos form -ROOT- table. > > > On Fri, Mar 13, 2009 at 1:10 AM, schubert zhang <[email protected]> wrote: > >> Thank you stack, it seems HBASE-1121.I will continue to track it. Sorry >> for the log files have been removed. >> >> >> On Fri, Mar 13, 2009 at 12:29 AM, stack <[email protected]> wrote: >> >>> Hey Schubert: >>> >>> Just FYI, after noticing the mismatch, rather than restart the whole >>> cluster, you might try closing the single region. That can jog the >>> master >>> into noticing it has a bad assignment. To do this, in the shell type >>> 'tools' and you'll see some admin facility. >>> >>> The root problem seems to be an issue fixed in the new hbase 0.19.1 >>> release >>> candidate: See HBASE-1121 'Cluster confused about where -ROOT- is'. >>> >>> Worrying is that even after a restart, you cannot get to the troublesome >>> region. Is it deployed on a regionserver? If so, anything pertinent in >>> the >>> logs regards this region? >>> >>> St.Ack >>> >>> On Thu, Mar 12, 2009 at 4:31 AM, schubert zhang <[email protected]> >>> wrote: >>> >>> > oh, it is not fine. >>> > Now, I can find: >>> > TESTTABLE,13575565...@2008-12-01 >>> > 17:16:55.117,1236847258901< >>> > >>> http://nd0-rack0-cloud:60010/regionhistorian.jsp?regionname=WAPCDR,13575565...@2008-12-01%2017:16:55.117,1236847258901 >>> > > >>> > nd1-rack0-cloud:60020 <http://nd1-rack0-cloud:60030/> 916003194 >>> > 13575565...@2008-12-01 17:16:55.117 13576301...@2008-12-0813:57:43.163 >>> > >>> > but when I try to get get 13575565...@2008-12-01 17:16:55.117, nothing >>> > returned. It seems this region is gone. >>> > >>> > >>> > On Thu, Mar 12, 2009 at 7:09 PM, schubert zhang <[email protected]> >>> wrote: >>> > >>> > > Hi all, >>> > > Today, I encounter a new issue about failure to batchUpdate commit. >>> > > >>> > > I am running a program to insert rows into a HBase table, but after >>> long >>> > > time of batchUpdating, following exception occur: >>> > > >>> > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to >>> > contact >>> > > region server Some server for region >>> TESTTABLE,13575565...@2008-12-0117:16:55.117,1236847258901, >>> > row '13575581...@2008-12-0606:15:48.077', but failed after 10 >>> attempts. >>> > > Exceptions: >>> > > at >>> > > >>> > >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942) >>> > > at >>> > > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372) >>> > > at >>> org.apache.hadoop.hbase.client.HTable.close(HTable.java:1385) >>> > > ...... >>> > > >>> > > And after waiting for a long time, I still cannot insert new data. >>> > > >>> > > Then, I check the HBase status, all master and regionservers are >>> running. >>> > > >>> > > But, I find a mismatch about region >>> "TESTTABLE,13575565...@2008-12-0117 >>> > :16:55.117,1236847258901". >>> > > In the metadata, I found it said this region is severed by >>> 10.24.1.12, >>> > but >>> > > when I check into 10.24.1.12, there is no this region. >>> > > And then, I stop all HBase cluster and start it. Regions locations >>> are >>> > > re-structured and seems everything is OK. >>> > > >>> > > In the log file of 10.24.1.12, I found following exceptions: >>> > > >>> > > 836118938_60020/hlog.dat.1236849158178, entries=100010. New log >>> writer: >>> > > /hbase/log_10.24.1.12_1236836118938_60020/hlog.dat.1236849168393 >>> > > 2009-03-12 17:12:49,298 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegion: >>> > > compaction completed on region TESTTABLE,13575565...@2008-12-0117 >>> :16:55.117,1236847258901 >>> > in 48sec >>> > > 2009-03-12 17:12:49,298 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegion: >>> > > Starting split of region TESTTABLE,13575565...@2008-12-0117 >>> > :16:55.117,1236847258901 >>> > > 2009-03-12 17:12:50,648 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegion: >>> > > Closed TESTTABLE,13575565...@2008-12-01 17:16:55.117,1236847258901 >>> > > 2009-03-12 17:12:50,809 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegion: >>> > > region TESTTABLE,13575565...@2008-12-0117 >>> :16:55.117,1236849169299/1762744366 >>> > available >>> > > 2009-03-12 17:12:50,809 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegion: >>> > > Closed TESTTABLE,13575565...@2008-12-01 17:16:55.117,1236849169299 >>> > > 2009-03-12 17:12:50,865 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegion: >>> > > region TESTTABLE,13575590...@2008-12-1615 >>> :49:40.143,1236849169299/1344805089 >>> > available >>> > > 2009-03-12 17:12:50,865 INFO >>> > org.apache.hadoop.hbase.regionserver.HRegion: >>> > > Closed TESTTABLE,13575590...@2008-12-16 15:49:40.143,1236849169299 >>> > > 2009-03-12 17:29:15,495 WARN org.apache.hadoop.hbase.RegionHistorian: >>> > > Unable to 'Region split from: WAPCDR,13575565...@2008-12-0117 >>> > :16:55.117,1236847258901' >>> > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to >>> > contact >>> > > region server Some server for region , row >>> > 'TESTTABLE,13575565...@2008-12-0117:16:55.117,1236849169299', but >>> failed >>> > after 11 attempts. >>> > > Exceptions: >>> > > org.apache.hadoop.hbase.NotServingRegionException: >>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 >>> > > at >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065) >>> > > at >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546) >>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>> Method) >>> > > at >>> > > >>> > >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> > > at >>> > > >>> > >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> > > at java.lang.reflect.Method.invoke(Method.java:597) >>> > > at >>> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) >>> > > at >>> > > >>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895) >>> > > >>> > > org.apache.hadoop.hbase.NotServingRegionException: >>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 >>> > > at >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065) >>> > > at >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546) >>> > > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown >>> Source) >>> > > at >>> > > >>> > >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> > > at java.lang.reflect.Method.invoke(Method.java:597) >>> > > at >>> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) >>> > > at >>> > > >>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895) >>> > > >>> > > org.apache.hadoop.hbase.NotServingRegionException: >>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 >>> > > >>> > >>> >> >> >
