Thanks for the prompt response.. Oops, forgot the specifics: I ran the whole thing on five region servers that also run hadoop's data node and task trackers: Each machine has 6 TB disk space (5TB available for the data node and 1 TB for MR and hbase temps), 24Gigs RAM, 3 gigs Hbase-heap size.. How do I give Hbase more RAM (are you talking about a config variable)? 3-4 gigs heap size is the max that 32-bit Java can take (or am I wrong?)..
AFAIK, I had synthetically generated the workload and I am pretty sure the column sizes are what I had mentioned.. >> 12 column families is at the extreme regards what we've played with, just >> FYI. Ah, ok.. Will alter the schema then.. >> There may also be corruption in one of the storefiles given that the >> OOME below seems to happen when we try and open a region (but the fact >> of opening may have no relation to why the OOME). True, but then, all the region servers crashed at roughly the same time and for the exact reason (OOME when a region was opened)... Was there a spike in update traffic after the mr job finished? Or was there a compaction happening by any chance? (although I don't see an explicit debug message here: not sure if I had the correct debug log level)... Vidhya On 5/13/10 11:05 AM, "Stack" <st...@duboce.net> wrote: Hello Vidhyashankar: How many regionservers? What version of hbase and hadoop? How much RAM on these machines in total? Can you give HBase more RAM? Also check that you don't have an exceptional cell in your input -- one that is very much larger than the 14KB you not below. 12 column families is at the extreme regards what we've played with, just FYI. You might try with a schema that has less: e.g. one CF for the big cell value and all others into the second CF. There may also be corruption in one of the storefiles given that the OOME below seems to happen when we try and open a region (but the fact of opening may have no relation to why the OOME). St.Ack On Thu, May 13, 2010 at 10:35 AM, Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com> wrote: > This is similar to a mail sent by another user to the group a couple of > months back.. I am quite new to Hbase and I've been trying to conduct a > basic experiment with Hbase.. > > I am trying to load 200 million records each record around 15 KB : with one > column value around 14KB and the rest of the 100 column values 8 bytes > each.. The 120 columns are grouped as 10 qualifiers X 12 families: hope I > got my jargon right.. Note that only one value is quite large for each doc > (when compared to other values)... > The data is uncompressed.. And each value is uniformly randomly selected.. > I used a map-reduce job to load a data file on hdfs into the database.. Soon > after the job finished, the region servers crash with OOM Exception.. Below > is part of the trace from the logs in one of the RS's: > > I have attached the conf along with the email: Can you guys point out any > anamoly in my settings? I have set a heap size of 3 gigs.. Anything > significantly more, java 32-bit doesn't run.. > > > 2010-05-12 19:22:45,068 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: > Total=8.43782MB (8847696), Free=1791.2247MB (1878235312), M > ax=1799.6626MB (1887083008), Counts: Blocks=1, Access=16947, Hit=52, > Miss=16895, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.3068389603868127%, > Miss Ratio=99 > .69316124916077%, Evicted/Run=NaN > 2010-05-12 19:22:45,069 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/1651418343/col5/7617863559659933969, > isReference=false, seque > nce id=2470632548, length=8456716, majorCompaction=false > 2010-05-12 19:22:45,075 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/1651418343/col6/1328113038200437659, > isReference=false, seque > nce id=2960732840, length=19861, majorCompaction=false > 2010-05-12 19:22:45,078 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/1651418343/col6/6484804359703635950, > isReference=false, seque > nce id=2470632548, length=8456716, majorCompaction=false > 2010-05-12 19:22:45,082 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/1651418343/col7/1673569837212457160, > isReference=false, seque > nce id=2960732840, length=19861, majorCompaction=false > 2010-05-12 19:22:45,085 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/1651418343/col7/4737399093829085995, > isReference=false, seque > nce id=2470632548, length=8456716, majorCompaction=false > 2010-05-12 19:22:47,238 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/1651418343/col8/8446828932792437464, > isReference=false, seque > nce id=2960732840, length=19861, majorCompaction=false2010-05-12 > 19:22:47,241 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded > /hbase/DocData/1651418343/col8/974386128174268353, isReference=false, sequen > ce id=2470632548, length=8456716, majorCompaction=false > 2010-05-12 19:22:48,804 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/1651418343/col9/2096232603557969237, > isReference=false, seque > nce id=2470632548, length=8456716, majorCompaction=false > 2010-05-12 19:22:48,807 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/1651418343/col9/7088206045660348092, > isReference=false, seque > nce id=2960732840, length=19861, majorCompaction=false > 2010-05-12 19:22:48,808 INFO org.apache.hadoop.hbase.regionserver.HRegion: > region DocData,4824176,1273625075099/1651418343 available; sequence id is > 29607328 > 41 > 2010-05-12 19:22:48,808 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: > DocData,40682172,1273607630618 > 2010-05-12 19:22:48,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Opening region DocData,40682172,1273607630618, encoded=271889952 > 2010-05-12 19:22:50,924 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/271889952/CONTENT/4859380626868896307, > isReference=false, sequence id=2959849236, length=337563, > majorCompaction=false2010-05-12 19:22:53,037 DEBUG > org.apache.hadoop.hbase.regionserver.Store: loaded > /hbase/DocData/271889952/CONTENT/952776139755887312, isReference=false, sequ > ence id=2082553088, length=110460013, majorCompaction=false > 2010-05-12 19:22:57,404 DEBUG org.apache.hadoop.hbase.regionserver.Store: > loaded /hbase/DocData/271889952/col1/66449684560689857, isReference=false, > sequence > id=2959849236, length=12648, majorCompaction=false > 2010-05-12 19:23:16,165 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening > DocData,40682172,1273607630618 > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.<init>(BufferedInputStream.java:178) > at > org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1369) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1626) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at > org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1372) > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:848) > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:793) > at > org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:273) > at > org.apache.hadoop.hbase.regionserver.StoreFile.<init>(StoreFile.java:129) > at > org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:410) > at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:221) > at > org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1549) > at > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:312) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1564) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1531) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1451) > at java.lang.Thread.run(Thread.java:619) > 2010-05-12 19:23:18,246 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, > aborting. > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.<init>(BufferedInputStream.java:178) > at > org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1369) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1626) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at > org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1372) > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:848) > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:793) > at > org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:273) > at > org.apache.hadoop.hbase.regionserver.StoreFile.<init>(StoreFile.java:129) > at > org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:410) > at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:221) > at > org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1549) > at > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:312) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1564) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1531) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1451) > at java.lang.Thread.run(Thread.java:619) > 2010-05-12 19:23:18,246 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > request=0.0, regions=942, stores=9411, storefiles=19887, > storefileIndexSize=182, memstoreSize=0, compactionQueueSize=0, > usedHeap=2999, maxHeap=2999, blockCacheSize=8847696, > blockCacheFree=1878235312, blockCacheCount=1, blockCacheHitRatio=0, > fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0 > 2010-05-12 19:23:18,247 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting > 2010-05-12 19:23:18,254 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 60020 > 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 0 on 60020: exiting > 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 1 on 60020: exiting > 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 3 on 60020: exiting > 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 2 on 60020: exiting > And so on (The region server has a total of 100 handlers).. > > >