I am having problems with my HRegionServers aborting following rolling
their logs and thus not being able to report to the HMaster for a
while. Eventually all 5 HRegionservers have exited and my MR which is
writing to HBase fails.
Reviewing the logs (with debugging on), I believe what is happening is
the Memcache goes negative and compaction and rolling logs begins.
After this is complete, the logs shows a FATAL error about not being
able to report to the Master and then it exits. It takes about 45 mins
to complete the slow death across all the HRservers
I have pasted the logs of one of the HRegionServers (omitting repeated
log rolling reports, null pointer exception for closed regions and
region closing messages.) Can someone point out what might be the
problem? Thanks in advance,
Clay
2008-07-23 16:20:32,991 WARN org.apache.hadoop.hbase.HRegion: Memcache
size went negative -59248835; resetting
2008-07-23 16:20:33,003 DEBUG org.apache.hadoop.hbase.HRegion:
Finished memcache flush for region anansi,http://org.himssconference.www/,1216843897872
in 16814ms, sequence id=106628188
2008-07-23 16:20:33,003 DEBUG org.apache.hadoop.hbase.HRegionServer:
Compaction requested for region: anansi,http://org.himssconference.www/,1216843897872
2008-07-23 16:20:33,004 INFO org.apache.hadoop.hbase.HRegion: checking
compaction on region anansi,http://org.himssconference.www/,1216843897872
2008-07-23 16:20:33,032 INFO org.apache.hadoop.hbase.HRegion: checking
compaction completed on region anansi,http://org.himssconference.www/,1216843897872
; status: false;
2008-07-23 16:26:09,289 DEBUG org.apache.hadoop.hbase.HLog: Closing
current log writer /hbase/log_10.1.2.8_1216843622025_60020/hlog.dat.060
2008-07-23 16:26:09,298 INFO org.apache.hadoop.hbase.HLog: New log
writer created at /hbase/log_10.1.2.8_1216843622025_60020/hlog.dat.061
2008-07-23 16:26:14,494 INFO org.apache.hadoop.hbase.HRegionServer:
Rolling hlog. Number of entries: 30001
2008-07-23 16:26:14,538 DEBUG org.apache.hadoop.hbase.HLog: Closing
current log writer /hbase/log_10.1.2.8_1216843622025_60020/hlog.dat.061
2008-07-23 16:26:14,548 INFO org.apache.hadoop.hbase.HLog: New log
writer created at /hbase/log_10.1.2.8_1216843622025_60020/hlog.dat.062
< omitted >
2008-07-23 16:32:58,652 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept 410869ms, ten times longer than scheduled: 10000
2008-07-23 16:32:58,652 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept 402421ms, ten times longer than scheduled: 3000
2008-07-23 16:32:59,060 FATAL org.apache.hadoop.hbase.HRegionServer:
unable to report to master for 402861 milliseconds - aborting server
2008-07-23 16:32:59,452 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020 closing leases
2008-07-23 16:33:00,294 INFO org.apache.hadoop.hbase.Leases
$LeaseMonitor: regionserver/0.0.0.0:60020.leaseChecker exiting
2008-07-23 16:33:00,534 INFO org.apache.hadoop.hbase.Leases:
regionserver/0.0.0.0:60020 closed leases
2008-07-23 16:33:00,534 INFO org.apache.hadoop.ipc.Server: Stopping
server on 60020
2008-07-23 16:33:03,421 INFO org.apache.hadoop.ipc.Server: Stopping
IPC Server listener on 60020
2008-07-23 16:33:03,422 INFO org.apache.hadoop.ipc.Server: Stopping
IPC Server Responder
2008-07-23 16:33:03,437 INFO org.apache.hadoop.hbase.HRegionServer:
Stopping infoServer
2008-07-23 16:33:03,537 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
2008-07-23 16:33:04,114 INFO org.mortbay.http.SocketListener: Stopped
SocketListener on 0.0.0.0:60030
2008-07-23 16:34:59,945 INFO org.mortbay.util.Container: Stopped
HttpContext[/static,/static]
2008-07-23 16:34:59,945 INFO org.apache.hadoop.hbase.HRegionServer:
worker thread exiting
2008-07-23 16:35:15,596 INFO org.apache.hadoop.hbase.HRegionServer:
regionserver/0.0.0.0:60020.compactor exiting
2008-07-23 16:35:15,715 INFO org.mortbay.util.Container: Stopped
HttpContext[/logs,/logs]
2008-07-23 16:35:15,729 INFO org.apache.hadoop.hbase.HRegionServer:
regionserver/0.0.0.0:60020.cacheFlusher exiting
2008-07-23 16:35:15,948 INFO org.mortbay.util.Container: Stopped
[EMAIL PROTECTED]
2008-07-23 16:35:17,528 INFO org.mortbay.util.Container: Stopped
WebApplicationContext[/,/]
2008-07-23 16:35:17,544 INFO org.mortbay.util.Container: Stopped
[EMAIL PROTECTED]
2008-07-23 16:35:17,544 DEBUG org.apache.hadoop.hbase.HLog: closing
log writer in hdfs://namenode:4010/hbase/
log_10.1.2.8_1216843622025_60020
2008-07-23 16:35:17,773 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 60020, call batchUpdate(anansi,http://com.davidhellmann.www/,1216150597773
, 9223372036854775807,
[EMAIL PROTECTED]) from 10.1.2.7:52761:
error: java.io.IOException: Server not running
java.io.IOException: Server not running
at org.apache.hadoop.hbase.HRegionServer.checkOpen(HRegionServer.java:
1758)
at
org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:
1547)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
2008-07-23 16:35:17,802 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 60020, call batchUpdate(anansi,http://com.davidhellmann.www/,1216150597773
, 9223372036854775807,
[EMAIL PROTECTED]) from 10.1.2.8:49245:
error: java.io.IOException: Server not running
java.io.IOException: Server not running
at org.apache.hadoop.hbase.HRegionServer.checkOpen(HRegionServer.java:
1758)
at
org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:
1547)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:413)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
2
2008-07-23 16:35:18,735 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 60020: exiting
2008-07-23 16:35:18,735 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 60020 caught: java.lang.NullPointerException
at org.apache.hadoop.ipc.Server.closeConnection(Server.java:976)
at org.apache.hadoop.ipc.Server.access$900(Server.java:66)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:
646)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:659)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:922)
2008-07-23 16:35:18,736 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 60020: exiting
2008-07-23 16:35:18,774 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call batchUpdate(anansi,http://com.davidhellmann.www/,1216150597773
, 9223372036854775807,
[EMAIL PROTECTED]) from 10.1.2.8:49245:
output error
2008-07-23 16:35:18,774 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 60020 caught: java.lang.NullPointerException
at org.apache.hadoop.ipc.Server.closeConnection(Server.java:976)
at org.apache.hadoop.ipc.Server.access$900(Server.java:66)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:
646)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:659)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:922)
2008-07-23 16:35:18,774 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 60020: exiting
2008-07-23 16:35:18,791 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call batchUpdate(anansi,http://com.davidhellmann.www/,1216150597773
, 9223372036854775807,
[EMAIL PROTECTED]) from 10.1.2.4:43896:
output error
2008-07-23 16:35:18,808 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 60020 caught: java.lang.NullPointerException
at org.apache.hadoop.ipc.Server.closeConnection(Server.java:976)
at org.apache.hadoop.ipc.Server.access$900(Server.java:66)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:
646)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:659)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:922)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun
.reflect
.NativeConstructorAccessorImpl
.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun
.reflect
.DelegatingConstructorAccessorImpl
.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org
.apache
.hadoop
.hbase
.RemoteExceptionHandler
.decodeRemoteException(RemoteExceptionHandler.java:82)
at
org
.apache
.hadoop
.hbase
.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)
at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:873)
at java.lang.Thread.run(Thread.java:619)
2008-07-23 16:35:19,221 DEBUG org.apache.hadoop.hbase.HRegionServer:
closing region anansi,http://org.himssconference.www/,1216843897872
2008-07-23 16:35:19,221 DEBUG org.apache.hadoop.hbase.HRegion:
compactions and cache flushes disabled for region anansi,http://org.himssconference.www/,1216843897872
2008-07-23 16:35:19,221 DEBUG org.apache.hadoop.hbase.HRegion: new
updates and scanners for region anansi,http://org.himssconference.www/,1216843897872
disabled
2008-07-23 16:35:19,221 DEBUG org.apache.hadoop.hbase.HRegion: no more
active scanners for region anansi,http://org.himssconference.www/,1216843897872
2008-07-23 16:35:19,239 DEBUG org.apache.hadoop.hbase.HRegion: no more
row locks outstanding on region anansi,http://org.himssconference.www/,1216843897872
2008-07-23 16:35:19,239 DEBUG org.apache.hadoop.hbase.HStore: closed
1854090017/out_edge
2008-07-23 16:35:19,240 DEBUG org.apache.hadoop.hbase.HStore: closed
1854090017/attrib
2008-07-23 16:35:19,240 DEBUG org.apache.hadoop.hbase.HStore: closed
1854090017/in_edge
2008-07-23 16:35:19,240 DEBUG org.apache.hadoop.hbase.HStore: closed
1854090017/update
2008-07-23 16:35:19,240 INFO org.apache.hadoop.hbase.HRegion: closed
anansi,http://org.himssconference.www/,1216843897872
2008-07-23 16:35:19,240 INFO org.apache.hadoop.hbase.HRegionServer:
aborting server at: 10.1.2.8:60020
2008-07-23 16:35:19,240 INFO org.apache.hadoop.hbase.HRegionServer:
regionserver/0.0.0.0:60020 exiting
2008-07-23 16:35:19,585 INFO org.apache.hadoop.hbase.HRegionServer:
Starting shutdown thread.
2008-07-23 16:35:19,585 INFO org.apache.hadoop.hbase.HRegionServer:
Shutdown thread complete
William Clay Moody
[EMAIL PROTECTED]