Here is client side stack trace: java.io.IOException: Call to us01-ciqps1-grid01.carrieriq.com/10.32.42.233:60020 failed on local exception: java.io.EOFException java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1037) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1222) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1144) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666) at com.carrieriq.m2m.platform.mmp2.input.StripedHBaseTable.flushAllStripesNew(StripedHBaseTable.java:300) On Tue, Aug 10, 2010 at 11:01 PM, Ryan Rawson <ryano...@gmail.com> wrote: > Use a tool like Yourkit to grovel that heap, the open source tools are > not really there yet. > > But your stack trace tells a lot.... the fatal allocation is in the > RPC layer. Either a client is sending a massive value, or you have a > semi-hostile network client sending bytes to your open socket which > are being interpreted as the buffer size to allocate. If you look at > the actual RPC code (any RPC code really) there is often a 'length' > field which is then used to allocate a dynamic buffer. > > -ryan > > On Tue, Aug 10, 2010 at 10:55 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > The compressed file is still big: > > -rw-r--r-- 1 hadoop users 809768340 Aug 11 05:49 java_pid26972.hprof.gz > > > > If you can tell me specific things to look for in the dump, I would > collect > > it (through jhat) and publish. > > > > Thanks > > > > On Tue, Aug 10, 2010 at 10:29 PM, Stack <st...@duboce.net> wrote: > > > >> On Tue, Aug 10, 2010 at 9:52 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> > Here are GC-related parameters: > >> > /usr/java/jdk1.6/bin/java -Xmx4000m -XX:+HeapDumpOnOutOfMemoryError > >> > -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > >> > > >> > >> You have > 2 CPUs per machine I take it? You could probably drop the > >> conservative XX:+CMSIncrementalMode. > >> > >> > The heap dump is big: > >> > -rw------- 1 hadoop users 4146551927 Aug 11 03:59 java_pid26972.hprof > >> > > >> > Do you have ftp server where I can upload it ? > >> > > >> > >> Not really. I was hoping you could put a compressed version under an > >> http server somewhere that I could pull from. You might as well > >> include the GC log while you are at it. > >> > >> Thanks Ted, > >> > >> St.Ack > >> > > >