Re: load balancing considerations

Ted Yu Tue, 10 Aug 2010 21:53:00 -0700

Here are GC-related parameters:
/usr/java/jdk1.6/bin/java -Xmx4000m -XX:+HeapDumpOnOutOfMemoryError
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode


The heap dump is big:
-rw------- 1 hadoop users 4146551927 Aug 11 03:59 java_pid26972.hprof

Do you have ftp server where I can upload it ?

Thanks

On Tue, Aug 10, 2010 at 9:38 PM, Stack <st...@duboce.net> wrote:

> Ted:
>
> You have 22 column families in your schema?  Do you need that many?
> Run with less if you can because 22 CFs takes you into a category that
> not many hang out in.  It may be at the root of the OOME.
>
> Otherwise, its the usual suspects -- a bad record perhaps?  One that
> was incorrectly formatted so it had a very large size on it?
>
> Do you run w/ GC enabled?  If not, try it.  Apparently its near to
> frictionless.  It might give us more clues.
>
> Also, when the RS crashes, it'll dump heap by default.  Do you see it?
>  If you put it someplace that I can pull, I'll take a look at it.
>
> St.Ack
>
> On Tue, Aug 10, 2010 at 9:30 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > We use 0.20.6 with HBASE-2473
> > As you can see from the following region server log snippet, OOME
> happened
> > to this RS:
> >
> > 2010-08-11 03:59:12,760 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Blocking updates for 'IPC Server handler 17 on 60020' on region
> >
> 2__HB_NOINC_GRID_0809-THREEGPPSPEECHCALLS-1281499094297,\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E,1281499095128:
> > memstore size 1.0g is >= than blocking 1.0g size
> > 2010-08-11 03:59:16,853 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Blocking updates for 'IPC Server handler 24 on 60020' on region
> >
> 2__HB_NOINC_GRID_0809-THREEGPPSPEECHCALLS-1281499094297,\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E,1281499095128:
> > memstore size 1.0g is >= than blocking 1.0g size
> > 2010-08-11 03:59:44,524 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
> > aborting.
> > java.lang.OutOfMemoryError: Java heap space
> >        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
> >        at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:825)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
> > 2010-08-11 03:59:44,525 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> > request=0.0, regions=9, stores=22, storefiles=4, storefileIndexSize=5,
> > memstoreSize=1502, compactionQueueSize=0, usedHeap=*3929*, maxHeap=3973,
> > blockCacheSize=6836104, blockCacheFree=826362424, blockCacheCount=0,
> > blockCacheHitRatio=0, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> >
> > Among the other RS, the highest usedHeap is 1750
> >
> > On Sat, Jul 31, 2010 at 3:31 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> #3 is going to be tricky... due to the ebb And flow of the gc this value
> >> isn't as accurate as one would wish. Furthermore we flush nematodes
> based
> >> on
> >> ram pressure.
> >>
> >> Any algorithm would have to have the property of being stable and
> >> conservative... rebalancing is not a 0 impact operation.
> >>
> >> There are jiras open for the rebalance based on load. To date it hasn't
> >> been
> >> a practical problem here at SU in our prod clusters however.
> >>
> >> On Jul 31, 2010 3:18 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:
> >> > Hi,
> >> > Currently load balancing only considers region count.
> >> > See ServerManager.getAverageLoad()
> >> >
> >> > I think load balancing should consider the following three factors for
> >> each
> >> > RS:
> >> > 1. number of regions it hosts
> >> > 2. number of requests it serves within given period
> >> > 3. how close usedHeap is to maxHeap
> >> >
> >> > Please comment how we should weigh the above three factors in deciding
> >> the
> >> > regions to offload from each RS.
> >> >
> >> > Thanks
> >>
> >
>

Re: load balancing considerations

Reply via email to