Re: load balancing considerations

Stack Wed, 11 Aug 2010 12:45:22 -0700

Yes. Thats the dump of the hbase view on your schema.  Maybe I was
just reading it wrong.
St.Ack


On Wed, Aug 11, 2010 at 11:37 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Vlad, my colleague said we don't have 22 CFs.
>
> Stack:
> Did you get that number from this:
> stores=22
>
> On Tue, Aug 10, 2010 at 9:38 PM, Stack <st...@duboce.net> wrote:
>
>> Ted:
>>
>> You have 22 column families in your schema?  Do you need that many?
>> Run with less if you can because 22 CFs takes you into a category that
>> not many hang out in.  It may be at the root of the OOME.
>>
>> Otherwise, its the usual suspects -- a bad record perhaps?  One that
>> was incorrectly formatted so it had a very large size on it?
>>
>> Do you run w/ GC enabled?  If not, try it.  Apparently its near to
>> frictionless.  It might give us more clues.
>>
>> Also, when the RS crashes, it'll dump heap by default.  Do you see it?
>>  If you put it someplace that I can pull, I'll take a look at it.
>>
>> St.Ack
>>
>> On Tue, Aug 10, 2010 at 9:30 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>> > We use 0.20.6 with HBASE-2473
>> > As you can see from the following region server log snippet, OOME
>> happened
>> > to this RS:
>> >
>> > 2010-08-11 03:59:12,760 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Blocking updates for 'IPC Server handler 17 on 60020' on region
>> >
>> 2__HB_NOINC_GRID_0809-THREEGPPSPEECHCALLS-1281499094297,\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E,1281499095128:
>> > memstore size 1.0g is >= than blocking 1.0g size
>> > 2010-08-11 03:59:16,853 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Blocking updates for 'IPC Server handler 24 on 60020' on region
>> >
>> 2__HB_NOINC_GRID_0809-THREEGPPSPEECHCALLS-1281499094297,\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E,1281499095128:
>> > memstore size 1.0g is >= than blocking 1.0g size
>> > 2010-08-11 03:59:44,524 FATAL
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
>> > aborting.
>> > java.lang.OutOfMemoryError: Java heap space
>> >        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
>> >        at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)        at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:825)
>> > at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
>> > at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>> > 2010-08-11 03:59:44,525 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>> > request=0.0, regions=9, stores=22, storefiles=4, storefileIndexSize=5,
>> > memstoreSize=1502, compactionQueueSize=0, usedHeap=*3929*, maxHeap=3973,
>> > blockCacheSize=6836104, blockCacheFree=826362424, blockCacheCount=0,
>> > blockCacheHitRatio=0, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
>> >
>> > Among the other RS, the highest usedHeap is 1750
>> >
>> > On Sat, Jul 31, 2010 at 3:31 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> #3 is going to be tricky... due to the ebb And flow of the gc this value
>> >> isn't as accurate as one would wish. Furthermore we flush nematodes
>> based
>> >> on
>> >> ram pressure.
>> >>
>> >> Any algorithm would have to have the property of being stable and
>> >> conservative... rebalancing is not a 0 impact operation.
>> >>
>> >> There are jiras open for the rebalance based on load. To date it hasn't
>> >> been
>> >> a practical problem here at SU in our prod clusters however.
>> >>
>> >> On Jul 31, 2010 3:18 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:
>> >> > Hi,
>> >> > Currently load balancing only considers region count.
>> >> > See ServerManager.getAverageLoad()
>> >> >
>> >> > I think load balancing should consider the following three factors for
>> >> each
>> >> > RS:
>> >> > 1. number of regions it hosts
>> >> > 2. number of requests it serves within given period
>> >> > 3. how close usedHeap is to maxHeap
>> >> >
>> >> > Please comment how we should weigh the above three factors in deciding
>> >> the
>> >> > regions to offload from each RS.
>> >> >
>> >> > Thanks
>> >>
>> >
>>
>

Re: load balancing considerations

Reply via email to