Hi,
DFS trouble. Have you taken the recommended steps according
to this Wiki page:
http://wiki.apache.org/hadoop/Hbase/Troubleshooting
?
Try the steps for #5, #6, and #7.
And/or, try adding more data nodes to spread the load.
Hope that helps,
- Andy
> 2009-04-14 16:17:08,718 INFO
> org.apache.hadoop.hbase.regionserver.HLog:
> removing old log file
> /hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239696959813
> whose
> highest sequence/edit id is 122635282
> 2009-04-14 16:17:14,932 INFO
> org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException:
> Not
> replicated
> yet:/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697028652
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown
> Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:697)
> at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> at $Proxy1.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown
> Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2823)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2705)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)
>
> 2009-04-14 16:17:14,932 WARN
> org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
> /hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697028652
> retries
> left 4
> 2009-04-14 16:17:15,499 INFO
> org.apache.hadoop.hbase.regionserver.HLog:
> Closed
> hdfs://compute-11-5.local:11004/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697021646,
> entries=100003. New log writer:
> /hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697035433
>
> .................................
>
>
> 2009-04-14 17:18:44,259 WARN
> org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
> /hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239700723643
> retries
> left 4
> 2009-04-14 17:18:44,663 INFO
> org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException:
> Not
> replicated
> yet:/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239700723643
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown
> Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:697)
> at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> at $Proxy1.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown
> Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2823)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2705)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)
>
> 2009-04-14 17:18:44,663 WARN
> org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
>
>
> There are 8 cores on each node, and we configured 4 map
> tasks to run
> simultaneously. Are we running at too high comcurrent rate?
>
>
> 2009/4/14 11 Nov. <[email protected]>
>
> > hi JD,
> > I tried your solution by upgrading hbase to 0.19.1
> and applying the
> > patch. The inserting mapreduce application has been
> running for more than
> > half an hour, we lost one region server and here is
> the log on the lost
> > region server:
> >
> > 2009-04-14 16:08:11,483 FATAL
> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher:
> Replay of hlog
> > required. Forcing server shutdown
> > org.apache.hadoop.hbase.DroppedSnapshotException:
> region:
> > CDR,000220285104,1239696381168
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:897)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:790)
> > at
> >
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:228)
> > at
> >
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:138)
> > Caused by: java.lang.ClassCastException: [B cannot be
> cast to
> > org.apache.hadoop.hbase.HStoreKey
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:679)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:636)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:882)
> > ... 3 more
> > 2009-04-14 16:08:11,553 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> Dump of metrics:
> > request=0.0, regions=13, stores=13, storefiles=63,
> storefileIndexSize=6,
> > memcacheSize=206, usedHeap=631, maxHeap=4991
> > 2009-04-14 16:08:11,553 INFO
> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher:
> > regionserver/0:0:0:0:0:0:0:0:62020.cacheFlusher
> exiting
> > 2009-04-14 16:08:12,502 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 14 on 62020, call batchUpdates([...@7075ae,
> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@573df2bb)
> from
> > 192.168.33.211:33093: error: java.io.IOException:
> Server not running,
> > aborting
> > java.io.IOException: Server not running, aborting
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
> > at
> sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> > 2009-04-14 16:08:12,502 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 14 on 62020, call batchUpdates([...@240affbc,
> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@4e1ba220)
> from
> > 192.168.33.212:48018: error: java.io.IOException:
> Server not running,
> > aborting
> > java.io.IOException: Server not running, aborting
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
> > at
> sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> > 2009-04-14 16:08:12,502 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 14 on 62020, call batchUpdates([...@78310aef,
> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@5bc50e8e)
> from
> > 192.168.33.253:48798: error: java.io.IOException:
> Server not running,
> > aborting
> > java.io.IOException: Server not running, aborting
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
> > at
> sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> > 2009-04-14 16:08:12,503 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 14 on 62020, call batchUpdates([...@663ebbb3,
> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@20951936)
> from
> > 192.168.34.2:52907: error: java.io.IOException: Server
> not running,
> > aborting
> > java.io.IOException: Server not running, aborting
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
> > at
> sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> > 2009-04-14 16:08:12,503 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 14 on 62020, call batchUpdates([...@1caa38f0,
> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@6b802343)
> from
> > 192.168.33.238:34167: error: java.io.IOException:
> Server not running,
> > aborting
> > java.io.IOException: Server not running, aborting
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
> > at
> sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> > 2009-04-14 16:08:12,503 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 14 on 62020, call batchUpdates([...@298b3ad8,
> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@73c45036)
> from
> > 192.168.33.236:45877: error: java.io.IOException:
> Server not running,
> > aborting
> > java.io.IOException: Server not running, aborting
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
> > at
> sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> > 2009-04-14 16:08:12,503 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 14 on 62020, call batchUpdates([...@5d6e449a,
> > [Lorg.apache.hadoop.hbase.io.BatchUpdate;@725a0a61)
> from
> > 192.168.33.254:35363: error: java.io.IOException:
> Server not running,
> > aborting
> > java.io.IOException: Server not running, aborting
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
> > at
> sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> > 2009-04-14 16:08:13,370 INFO
> org.apache.hadoop.ipc.HBaseServer: Stopping
> > server on 62020
> > 2009-04-14 16:08:13,370 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 5 on 62020: exiting
> > 2009-04-14 16:08:13,370 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> Stopping infoServer
> > 2009-04-14 16:08:13,370 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 16 on 62020: exiting
> > 2009-04-14 16:08:13,370 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 3 on 62020: exiting
> > 2009-04-14 16:08:13,370 INFO
> org.mortbay.util.ThreadedServer: Stopping
> > Acceptor
> ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
> > 2009-04-14 16:08:13,371 INFO
> org.apache.hadoop.ipc.HBaseServer: Stopping
> > IPC Server listener on 62020
> > 2009-04-14 16:08:13,371 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 1 on 62020: exiting
> > 2009-04-14 16:08:13,371 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 2 on 62020: exiting
> > 2009-04-14 16:08:13,371 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 4 on 62020: exiting
> > 2009-04-14 16:08:13,371 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 6 on 62020: exiting
> > 2009-04-14 16:08:13,371 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 10 on 62020: exiting
> > 2009-04-14 16:08:13,371 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 8 on 62020: exiting
> > 2009-04-14 16:08:13,370 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 0 on 62020: exiting
> > 2009-04-14 16:08:13,371 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 11 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 9 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 13 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 14 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 15 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 17 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 19 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 18 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 12 on 62020: exiting
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: Stopping
> > IPC Server Responder
> > 2009-04-14 16:08:13,372 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC Server
> > handler 7 on 62020: exiting
> > 2009-04-14 16:08:13,464 INFO
> org.mortbay.http.SocketListener: Stopped
> > SocketListener on 0.0.0.0:62030
> > 2009-04-14 16:08:13,471 INFO
> org.mortbay.util.Container: Stopped
> > HttpContext[/logs,/logs]
> > 2009-04-14 16:08:13,471 INFO
> org.mortbay.util.Container: Stopped
> >
> org.mortbay.jetty.servlet.webapplicationhand...@460c5e9c
> > 2009-04-14 16:08:14,887 INFO
> > org.apache.hadoop.hbase.regionserver.LogFlusher:
> > regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
> > 2009-04-14 16:08:14,890 INFO
> org.apache.hadoop.hbase.Leases:
> > regionserver/0:0:0:0:0:0:0:0:62020.leaseChecker
> closing leases
> > 2009-04-14 16:08:14,890 INFO
> org.mortbay.util.Container: Stopped
> > WebApplicationContext[/static,/static]
> > 2009-04-14 16:08:14,890 INFO
> org.apache.hadoop.hbase.Leases:
> > regionserver/0:0:0:0:0:0:0:0:62020.leaseChecker closed
> leases
> > 2009-04-14 16:08:14,890 INFO
> org.mortbay.util.Container: Stopped
> >
> org.mortbay.jetty.servlet.webapplicationhand...@62c2ee15
> > 2009-04-14 16:08:14,896 INFO
> org.mortbay.util.Container: Stopped
> > WebApplicationContext[/,/]
> > 2009-04-14 16:08:14,896 INFO
> org.mortbay.util.Container: Stopped
> > org.mortbay.jetty.ser...@3f829e6f
> > 2009-04-14 16:08:14,896 INFO
> > org.apache.hadoop.hbase.regionserver.LogRoller:
> LogRoller exiting.
> > 2009-04-14 16:08:14,896 INFO
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
> >
> regionserver/0:0:0:0:0:0:0:0:62020.majorCompactionChecker
> exiting
> > 2009-04-14 16:08:14,969 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: On
> abort, closed hlog
> > 2009-04-14 16:08:14,969 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000145028698,1239695232467
> > 2009-04-14 16:08:14,970 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000485488629,1239696366886
> > 2009-04-14 16:08:14,970 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000030226388,1239695919978
> > 2009-04-14 16:08:14,971 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000045007972,1239696394474
> > 2009-04-14 16:08:14,971 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000370014326,1239695407460
> > 2009-04-14 16:08:17,790 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> worker thread exiting
> > 2009-04-14 16:08:46,566 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > compaction completed on region
> CDR,000315256623,1239695638429 in 1mins, 3sec
> > 2009-04-14 16:08:46,566 INFO
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> > regionserver/0:0:0:0:0:0:0:0:62020.compactor exiting
> > 2009-04-14 16:08:46,567 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000315256623,1239695638429
> > 2009-04-14 16:08:46,568 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000555259592,1239696091451
> > 2009-04-14 16:08:46,569 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000575345572,1239696111244
> > 2009-04-14 16:08:46,570 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000515619625,1239696375751
> > 2009-04-14 16:08:46,570 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000525154897,1239695988209
> > 2009-04-14 16:08:46,570 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000220285104,1239696381168
> > 2009-04-14 16:08:46,571 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000045190615,1239696394474
> > 2009-04-14 16:08:46,572 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed CDR,000555161660,1239696091451
> > 2009-04-14 16:08:46,572 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> aborting server at:
> > 192.168.33.215:62020
> > 2009-04-14 16:08:46,684 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > regionserver/0:0:0:0:0:0:0:0:62020 exiting
> > 2009-04-14 16:08:46,713 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> Starting shutdown
> > thread.
> > 2009-04-14 16:08:46,714 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> Shutdown thread complete
> >
> >
> > I restarted this region server and now it seems
> that it works just
> > fine.
> >
> >
> > 2009/4/14 11 Nov. <[email protected]>
> >
> > hi Jean-Daniel,
> >> As you said, we were inserting data using
> sequential pattern, and if
> >> we use random pattern there would not be such
> prolem.
> >> I'm trying hbase 0.19.1 and the patch now.
> >> Thanks!
> >>
> >> 2009/4/13 Jean-Daniel Cryans
> <[email protected]>
> >>
> >> I see that your region server had 5188 store files
> in 121 store, I'm
> >>> 99% sure that it's the cause of your OOME.
> Luckily for you, we've been
> >>> working on this issue since last week. What
> you should do :
> >>>
> >>> - Upgrade to HBase 0.19.1
> >>>
> >>> - Apply the latest patch in
> >>>
> https://issues.apache.org/jira/browse/HBASE-1058 (the v3)
> >>>
> >>> Then you should be good. As to what caused
> this huge number of store
> >>> files, I wouldn't be surprised if your
> data was uploaded sequentially
> >>> so that would mean that whatever the number of
> regions (hence the
> >>> level of distribution) in your table, only 1
> region gets the load.
> >>> This implies that another work around to your
> problem would be to
> >>> insert with a more randomized pattern.
> >>>
> >>> Thx for trying either solution,
> >>>
> >>> J-D
> >>>
> >>> On Mon, Apr 13, 2009 at 8:28 AM, 11 Nov.
> <[email protected]> wrote:
> >>> > hi coleagues,
> >>> > We are doing data inserting on 32
> nodes hbase cluster using
> >>> mapreduce
> >>> > framework recently, but the operation
> always gets failed because of
> >>> > regionserver exceptions. We issued 4 map
> task on the same node
> >>> > simultaneously, and exploit the
> BatchUpdate() function to handle work
> >>> of
> >>> > inserting data.
> >>> > We had been suffered from such problem
> since last month, which only
> >>> took
> >>> > place on relatively large clusters at
> high concurrent inserting rate.
> >>> We are
> >>> > using hadoop-0.19.2 on current svn, and
> it's the head revision on svn
> >>> last
> >>> > week. We are using hbase 0.19.0.
> >>> >
> >>> > Here is the configure file of
> hadoop-site.xml:
> >>> >
> >>> > <configuration>
> >>> > <property>
> >>> > <name>fs.default.name</name>
> >>> >
> <value>hdfs://192.168.33.204:11004/</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.job.tracker</name>
> >>> >
> <value>192.168.33.204:11005</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.secondary.http.address</name>
> >>> > <value>0.0.0.0:51100</value>
> >>> > <description>
> >>> > The secondary namenode http server
> address and port.
> >>> > If the port is 0 then the server will
> start on a free port.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.datanode.address</name>
> >>> > <value>0.0.0.0:51110</value>
> >>> > <description>
> >>> > The address where the datanode server
> will listen to.
> >>> > If the port is 0 then the server will
> start on a free port.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.datanode.http.address</name>
> >>> > <value>0.0.0.0:51175</value>
> >>> > <description>
> >>> > The datanode http server address and
> port.
> >>> > If the port is 0 then the server will
> start on a free port.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.datanode.ipc.address</name>
> >>> > <value>0.0.0.0:11010</value>
> >>> > <description>
> >>> > The datanode ipc server address and
> port.
> >>> > If the port is 0 then the server will
> start on a free port.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.datanode.handler.count</name>
> >>> > <value>30</value>
> >>> > <description>The number of server
> threads for the
> >>> datanode.</description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.namenode.handler.count</name>
> >>> > <value>30</value>
> >>> > <description>The number of server
> threads for the
> >>> namenode.</description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.job.tracker.handler.count</name>
> >>> > <value>30</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.reduce.parallel.copies</name>
> >>> > <value>30</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.http.address</name>
> >>> > <value>0.0.0.0:51170</value>
> >>> > <description>
> >>> > The address and the base port where
> the dfs namenode web ui will
> >>> listen
> >>> > on.
> >>> > If the port is 0 then the server will
> start on a free port.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.datanode.max.xcievers</name>
> >>> > <value>8192</value>
> >>> > <description>
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.datanode.socket.write.timeout</name>
> >>> > <value>0</value>
> >>> > <description>
> >>> > </description>
> >>> > </property>
> >>> >
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.datanode.https.address</name>
> >>> > <value>0.0.0.0:50477</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>dfs.https.address</name>
> >>> > <value>0.0.0.0:50472</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.job.tracker.http.address</name>
> >>> > <value>0.0.0.0:51130</value>
> >>> > <description>
> >>> > The job tracker http server address
> and port the server will listen
> >>> on.
> >>> > If the port is 0 then the server will
> start on a free port.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.task.tracker.http.address</name>
> >>> > <value>0.0.0.0:51160</value>
> >>> > <description>
> >>> > The task tracker http server address
> and port.
> >>> > If the port is 0 then the server will
> start on a free port.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.map.tasks</name>
> >>> > <value>3</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.reduce.tasks</name>
> >>> > <value>2</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.tasktracker.map.tasks.maximum</name>
> >>> > <value>4</value>
> >>> > <description>
> >>> > The maximum number of map tasks
> that will be run simultaneously
> >>> by a
> >>> > task tracker.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> > <name>dfs.name.dir</name>
> >>> >
> >>> >
> >>>
> <value>/data0/hbase/filesystem/dfs/name,/data1/hbase/filesystem/dfs/name,/data2/hbase/filesystem/dfs/name,/data3/hbase/filesystem/dfs/name</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> > <name>dfs.data.dir</name>
> >>> >
> >>> >
> >>>
> <value>/data0/hbase/filesystem/dfs/data,/data1/hbase/filesystem/dfs/data,/data2/hbase/filesystem/dfs/data,/data3/hbase/filesystem/dfs/data</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>fs.checkpoint.dir</name>
> >>> >
> >>> >
> >>>
> <value>/data0/hbase/filesystem/dfs/namesecondary,/data1/hbase/filesystem/dfs/namesecondary,/data2/hbase/filesystem/dfs/namesecondary,/data3/hbase/filesystem/dfs/namesecondary</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.system.dir</name>
> >>> >
> <value>/data1/hbase/filesystem/mapred/system</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.local.dir</name>
> >>> >
> >>> >
> >>>
> <value>/data0/hbase/filesystem/mapred/local,/data1/hbase/filesystem/mapred/local,/data2/hbase/filesystem/mapred/local,/data3/hbase/filesystem/mapred/local</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> > <name>dfs.replication</name>
> >>> > <value>3</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> > <name>hadoop.tmp.dir</name>
> >>> >
> <value>/data1/hbase/filesystem/tmp</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>mapred.task.timeout</name>
> >>> > <value>3600000</value>
> >>> > <description>The number of
> milliseconds before a task will be
> >>> > terminated if it neither reads an input,
> writes an output, nor
> >>> > updates its status string.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>ipc.client.idlethreshold</name>
> >>> > <value>4000</value>
> >>> > <description>Defines the threshold
> number of connections after which
> >>> > connections will be
> inspected for idleness.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> >
> >>> > <property>
> >>> >
> <name>ipc.client.connection.maxidletime</name>
> >>> > <value>120000</value>
> >>> > <description>The maximum time in
> msec after which a client will bring
> >>> down
> >>> > the
> >>> > connection to the server.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> > <value>-Xmx256m
> -XX:+UseConcMarkSweepGC
> >>> -XX:+CMSIncrementalMode</value>
> >>> > </property>
> >>> >
> >>> > </configuration>
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > And here is the hbase-site.xml config
> file:
> >>> >
> >>> > <?xml version="1.0"?>
> >>> > <?xml-stylesheet
> type="text/xsl"
> href="configuration.xsl"?>
> >>> >
> >>> > <configuration>
> >>> > <property>
> >>> > <name>hbase.master</name>
> >>> >
> <value>192.168.33.204:62000</value>
> >>> > <description>The host and port
> that the HBase master runs at.
> >>> > A value of 'local' runs the
> master and a regionserver in
> >>> > a single process.
> >>> > </description>
> >>> > </property>
> >>> > <property>
> >>> > <name>hbase.rootdir</name>
> >>> >
> <value>hdfs://192.168.33.204:11004/hbase</value>
> >>> > <description>The directory
> shared by region servers.
> >>> > Should be fully-qualified to include
> the filesystem to use.
> >>> > E.g:
> hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>hbase.master.info.port</name>
> >>> > <value>62010</value>
> >>> > <description>The port for the
> hbase master web UI
> >>> > Set to -1 if you do not want the info
> server to run.
> >>> > </description>
> >>> > </property>
> >>> > <property>
> >>> >
> <name>hbase.master.info.bindAddress</name>
> >>> > <value>0.0.0.0</value>
> >>> > <description>The address for the
> hbase master web UI
> >>> > </description>
> >>> > </property>
> >>> > <property>
> >>> >
> <name>hbase.regionserver</name>
> >>> >
> <value>0.0.0.0:62020</value>
> >>> > <description>The host and port a
> HBase region server runs at.
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>hbase.regionserver.info.port</name>
> >>> > <value>62030</value>
> >>> > <description>The port for the
> hbase regionserver web UI
> >>> > Set to -1 if you do not want the info
> server to run.
> >>> > </description>
> >>> > </property>
> >>> > <property>
> >>> >
> <name>hbase.regionserver.info.bindAddress</name>
> >>> > <value>0.0.0.0</value>
> >>> > <description>The address for the
> hbase regionserver web UI
> >>> > </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>hbase.regionserver.handler.count</name>
> >>> > <value>20</value>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >
> <name>hbase.master.lease.period</name>
> >>> > <value>180000</value>
> >>> > </property>
> >>> >
> >>> > </configuration>
> >>> >
> >>> >
> >>> > Here is a slice of the error log file
> on one of the failed
> >>> > regionservers, which lose response after
> the OOM Exception:
> >>> >
> >>> > 2009-04-13 15:20:26,077 FATAL
> >>> >
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> OutOfMemoryError,
> >>> > aborting.
> >>> > java.lang.OutOfMemoryError: Java heap
> space
> >>> > 2009-04-13 15:20:48,062 INFO
> >>> >
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
> metrics:
> >>> > request=0, regions=121, stores=121,
> storefiles=5188,
> >>> storefileIndexSize=195,
> >>> > memcacheSize=214, usedHeap=4991,
> maxHeap=4991
> >>> > 2009-04-13 15:20:48,062 INFO
> org.apache.hadoop.ipc.HBaseServer:
> >>> Stopping
> >>> > server on 62020
> >>> > 2009-04-13 15:20:48,063 INFO
> >>> >
> org.apache.hadoop.hbase.regionserver.LogFlusher:
> >>> >
> regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
> >>> > 2009-04-13 15:20:48,201 INFO
> >>> >
> org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
> infoServer
> >>> > 2009-04-13 15:20:48,228 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@74f0bb4e,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@689939dc) from
> >>> > 192.168.33.206:47754: output error
> >>> > 2009-04-13 15:20:48,229 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 5 on 62020 caught:
> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:48,229 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 5 on 62020: exiting
> >>> > 2009-04-13 15:20:48,297 INFO
> org.apache.hadoop.ipc.HBaseServer:
> >>> Stopping IPC
> >>> > Server Responder
> >>> > 2009-04-13 15:20:48,552 INFO
> org.apache.zookeeper.ClientCnxn:
> >>> Attempting
> >>> > connection to server /192.168.33.204:2181
> >>> > 2009-04-13 15:20:48,552 WARN
> org.apache.zookeeper.ClientCnxn: Exception
> >>> > closing session 0x0 to
> sun.nio.ch.selectionkeyi...@480edf31
> >>> > java.io.IOException: TIMED OUT
> >>> > at
> >>>
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
> >>> > 2009-04-13 15:20:48,555 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 9 on 62020, call
> batchUpdates([...@3509aa7f,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@d98930d) from
> >>> 192.168.33.234:44367:
> >>> > error: java.io.IOException: Server not
> running, aborting
> >>> > java.io.IOException: Server not running,
> aborting
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
> >>> > at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>> > at
> >>> >
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>> > at
> >>> >
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> >>> > at
> >>>
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >>> > 2009-04-13 15:20:48,561 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@525a19ce,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@19544d9f) from
> >>> > 192.168.33.208:47852: output error
> >>> > 2009-04-13 15:20:48,561 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@483206fe,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@4c6932b9) from
> >>> > 192.168.33.221:37020: output error
> >>> > 2009-04-13 15:20:48,561 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 0 on 62020 caught:
> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:48,561 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 0 on 62020: exiting
> >>> > 2009-04-13 15:20:48,561 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 7 on 62020 caught:
> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:48,655 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 7 on 62020: exiting
> >>> > 2009-04-13 15:20:48,692 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@61af3c0e,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@378fed3c) from
> >>> 192.168.34.1:35923:
> >>> > output error
> >>> > 2009-04-13 15:20:48,877 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@2c4ff8dd,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@365b8be5) from
> >>> 192.168.34.3:39443:
> >>> > output error
> >>> > 2009-04-13 15:20:48,877 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 16 on 62020 caught:
> >>> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:48,877 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 16 on 62020: exiting
> >>> > 2009-04-13 15:20:48,877 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@343d8344,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@32750027) from
> >>> > 192.168.33.236:45479: output error
> >>> > 2009-04-13 15:20:49,008 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 17 on 62020 caught:
> >>> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:49,008 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 17 on 62020: exiting
> >>> > 2009-04-13 15:20:48,654 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@3ff34fed,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@7f047167) from
> >>> > 192.168.33.219:40059: output error
> >>> > 2009-04-13 15:20:48,654 ERROR
> >>> com.cmri.hugetable.zookeeper.ZNodeWatcher:
> >>> > processNode
> /hugetable09/hugetable/acl.lock error!KeeperErrorCode =
> >>> > ConnectionLoss
> >>> > 2009-04-13 15:20:48,649 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@721d9b81,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@75cc6cae) from
> >>> > 192.168.33.254:51617: output error
> >>> > 2009-04-13 15:20:48,649 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 12 on 62020, call
> batchUpdates([...@655edc27,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@36c7b86f) from
> >>> > 192.168.33.238:51231: error:
> java.io.IOException: Server not running,
> >>> > aborting
> >>> > java.io.IOException: Server not running,
> aborting
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
> >>> > at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>> > at
> >>> >
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>> > at
> >>> >
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> >>> > at
> >>>
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >>> > 2009-04-13 15:20:48,648 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@3c853cce,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@4f5b176c) from
> >>> > 192.168.33.209:43520: output error
> >>> > 2009-04-13 15:20:49,225 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 4 on 62020 caught:
> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:49,226 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 4 on 62020: exiting
> >>> > 2009-04-13 15:20:48,648 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@3509aa7f,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@d98930d) from
> >>> 192.168.33.234:44367:
> >>> > output error
> >>> > 2009-04-13 15:20:48,647 INFO
> org.mortbay.util.ThreadedServer: Stopping
> >>> > Acceptor
> ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
> >>> > 2009-04-13 15:20:49,266 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 9 on 62020 caught:
> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:49,266 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 9 on 62020: exiting
> >>> > 2009-04-13 15:20:48,646 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 2 on 62020, call
> batchUpdates([...@2cc91b6,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@44724529) from
> >>> > 192.168.33.210:44154: error:
> java.io.IOException: Server not running,
> >>> > aborting
> >>> > java.io.IOException: Server not running,
> aborting
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
> >>> > at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>> > at
> >>> >
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>> > at
> >>> >
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> >>> > at
> >>>
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >>> > 2009-04-13 15:20:48,572 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@e8136e0,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@4539b390) from
> >>> > 192.168.33.217:60476: output error
> >>> > 2009-04-13 15:20:49,272 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@2cc91b6,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@44724529) from
> >>> > 192.168.33.210:44154: output error
> >>> > 2009-04-13 15:20:49,272 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 8 on 62020 caught:
> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:49,272 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 8 on 62020: exiting
> >>> > 2009-04-13 15:20:49,263 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@655edc27,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@36c7b86f) from
> >>> > 192.168.33.238:51231: output error
> >>> > 2009-04-13 15:20:49,225 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 1 on 62020 caught:
> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:49,068 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 14 on 62020 caught:
> >>> java.nio.channels.ClosedByInterruptException
> >>> > at
> >>> >
> >>>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > 2009-04-13 15:20:49,345 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 14 on 62020: exiting
> >>> > 2009-04-13 15:20:49,048 ERROR
> >>> >
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> >>> > java.lang.OutOfMemoryError: Java heap
> space
> >>> > 2009-04-13 15:20:49,484 FATAL
> >>> >
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> OutOfMemoryError,
> >>> > aborting.
> >>> > java.lang.OutOfMemoryError: Java heap
> space
> >>> > at
> >>> >
> >>>
> java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
> >>> > at
> sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> >>> > at
> >>> >
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> >>> > at
> >>>
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >>> > 2009-04-13 15:20:49,488 INFO
> >>> >
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
> metrics:
> >>> > request=0, regions=121, stores=121,
> storefiles=5188,
> >>> storefileIndexSize=195,
> >>> > memcacheSize=214, usedHeap=4985,
> maxHeap=4991
> >>> > 2009-04-13 15:20:49,489 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 15 on 62020, call
> batchUpdates([...@302bb17f,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@492218e) from
> >>> 192.168.33.235:35276:
> >>> > error: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space
> >>> > java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1334)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1324)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2320)
> >>> > at
> sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> >>> > at
> >>> >
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> >>> > at
> >>>
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >>> > Caused by: java.lang.OutOfMemoryError:
> Java heap space
> >>> > at
> >>> >
> >>>
> java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
> >>> > ... 5 more
> >>> > 2009-04-13 15:20:49,490 WARN
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > Responder, call batchUpdates([...@302bb17f,
> >>> >
> [Lorg.apache.hadoop.hbase.io.BatchUpdate;@492218e) from
> >>> 192.168.33.235:35276:
> >>> > output error
> >>> > 2009-04-13 15:20:49,047 INFO
> org.apache.hadoop.ipc.HBaseServer:
> >>> Stopping IPC
> >>> > Server listener on 62020
> >>> > 2009-04-13 15:20:49,493 INFO
> org.apache.hadoop.ipc.HBaseServer: IPC
> >>> Server
> >>> > handler 15 on 62020 caught:
> java.nio.channels.ClosedChannelException
> >>> > at
> >>> >
> >>>
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
> >>> > at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
> >>> >
> >>> > Any suggenstion is welcomed! Thanks a
> lot!
> >>> >
> >>>
> >>
> >>
> >