Yes. Thats the dump of the hbase view on your schema. Maybe I was just reading it wrong. St.Ack
On Wed, Aug 11, 2010 at 11:37 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Vlad, my colleague said we don't have 22 CFs. > > Stack: > Did you get that number from this: > stores=22 > > On Tue, Aug 10, 2010 at 9:38 PM, Stack <st...@duboce.net> wrote: > >> Ted: >> >> You have 22 column families in your schema? Do you need that many? >> Run with less if you can because 22 CFs takes you into a category that >> not many hang out in. It may be at the root of the OOME. >> >> Otherwise, its the usual suspects -- a bad record perhaps? One that >> was incorrectly formatted so it had a very large size on it? >> >> Do you run w/ GC enabled? If not, try it. Apparently its near to >> frictionless. It might give us more clues. >> >> Also, when the RS crashes, it'll dump heap by default. Do you see it? >> If you put it someplace that I can pull, I'll take a look at it. >> >> St.Ack >> >> On Tue, Aug 10, 2010 at 9:30 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> > We use 0.20.6 with HBASE-2473 >> > As you can see from the following region server log snippet, OOME >> happened >> > to this RS: >> > >> > 2010-08-11 03:59:12,760 INFO >> org.apache.hadoop.hbase.regionserver.HRegion: >> > Blocking updates for 'IPC Server handler 17 on 60020' on region >> > >> 2__HB_NOINC_GRID_0809-THREEGPPSPEECHCALLS-1281499094297,\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E,1281499095128: >> > memstore size 1.0g is >= than blocking 1.0g size >> > 2010-08-11 03:59:16,853 INFO >> org.apache.hadoop.hbase.regionserver.HRegion: >> > Blocking updates for 'IPC Server handler 24 on 60020' on region >> > >> 2__HB_NOINC_GRID_0809-THREEGPPSPEECHCALLS-1281499094297,\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E\x0E,1281499095128: >> > memstore size 1.0g is >= than blocking 1.0g size >> > 2010-08-11 03:59:44,524 FATAL >> > org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, >> > aborting. >> > java.lang.OutOfMemoryError: Java heap space >> > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39) >> > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) at >> > >> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:825) >> > at >> > >> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419) >> > at >> > >> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318) >> > 2010-08-11 03:59:44,525 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: >> > request=0.0, regions=9, stores=22, storefiles=4, storefileIndexSize=5, >> > memstoreSize=1502, compactionQueueSize=0, usedHeap=*3929*, maxHeap=3973, >> > blockCacheSize=6836104, blockCacheFree=826362424, blockCacheCount=0, >> > blockCacheHitRatio=0, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0 >> > >> > Among the other RS, the highest usedHeap is 1750 >> > >> > On Sat, Jul 31, 2010 at 3:31 PM, Ryan Rawson <ryano...@gmail.com> wrote: >> > >> >> Hi, >> >> >> >> #3 is going to be tricky... due to the ebb And flow of the gc this value >> >> isn't as accurate as one would wish. Furthermore we flush nematodes >> based >> >> on >> >> ram pressure. >> >> >> >> Any algorithm would have to have the property of being stable and >> >> conservative... rebalancing is not a 0 impact operation. >> >> >> >> There are jiras open for the rebalance based on load. To date it hasn't >> >> been >> >> a practical problem here at SU in our prod clusters however. >> >> >> >> On Jul 31, 2010 3:18 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: >> >> > Hi, >> >> > Currently load balancing only considers region count. >> >> > See ServerManager.getAverageLoad() >> >> > >> >> > I think load balancing should consider the following three factors for >> >> each >> >> > RS: >> >> > 1. number of regions it hosts >> >> > 2. number of requests it serves within given period >> >> > 3. how close usedHeap is to maxHeap >> >> > >> >> > Please comment how we should weigh the above three factors in deciding >> >> the >> >> > regions to offload from each RS. >> >> > >> >> > Thanks >> >> >> > >> >