This is similar to a mail sent by another user to the group a couple of months back.. I am quite new to Hbase and I've been trying to conduct a basic experiment with Hbase..
1. I am trying to load 200 million records each record around 15 KB : with one column value around 14KB and the rest of the 100 column values 8 bytes each.. The 120 columns are grouped as 10 qualifiers X 12 families: hope I got my jargon right.. Note that only one value is quite large for each doc (when compared to other values)... 2. The data is uncompressed.. And each value is uniformly randomly selected.. 3. I used a map-reduce job to load a data file on hdfs into the database.. Soon after the job finished, the region servers crash with OOM Exception.. Below is part of the trace from the logs in one of the RS's: I have attached the conf along with the email: Can you guys point out any anamoly in my settings? I have set a heap size of 3 gigs.. Anything significantly more, java 32-bit doesn't run.. 2010-05-12 19:22:45,068 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=8.43782MB (8847696), Free=1791.2247MB (1878235312), M ax=1799.6626MB (1887083008), Counts: Blocks=1, Access=16947, Hit=52, Miss=16895, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.3068389603868127%, Miss Ratio=99 .69316124916077%, Evicted/Run=NaN 2010-05-12 19:22:45,069 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col5/7617863559659933969, isReference=false, seque nce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:45,075 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col6/1328113038200437659, isReference=false, seque nce id=2960732840, length=19861, majorCompaction=false 2010-05-12 19:22:45,078 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col6/6484804359703635950, isReference=false, seque nce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:45,082 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col7/1673569837212457160, isReference=false, seque nce id=2960732840, length=19861, majorCompaction=false 2010-05-12 19:22:45,085 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col7/4737399093829085995, isReference=false, seque nce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:47,238 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col8/8446828932792437464, isReference=false, seque nce id=2960732840, length=19861, majorCompaction=false2010-05-12 19:22:47,241 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col8/974386128174268353, isReference=false, sequen ce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:48,804 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col9/2096232603557969237, isReference=false, seque nce id=2470632548, length=8456716, majorCompaction=false 2010-05-12 19:22:48,807 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/1651418343/col9/7088206045660348092, isReference=false, seque nce id=2960732840, length=19861, majorCompaction=false 2010-05-12 19:22:48,808 INFO org.apache.hadoop.hbase.regionserver.HRegion: region DocData,4824176,1273625075099/1651418343 available; sequence id is 29607328 41 2010-05-12 19:22:48,808 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: DocData,40682172,1273607630618 2010-05-12 19:22:48,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region DocData,40682172,1273607630618, encoded=271889952 2010-05-12 19:22:50,924 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/271889952/CONTENT/4859380626868896307, isReference=false, sequence id=2959849236, length=337563, majorCompaction=false2010-05-12 19:22:53,037 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/271889952/CONTENT/952776139755887312, isReference=false, sequ ence id=2082553088, length=110460013, majorCompaction=false 2010-05-12 19:22:57,404 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded /hbase/DocData/271889952/col1/66449684560689857, isReference=false, sequence id=2959849236, length=12648, majorCompaction=false 2010-05-12 19:23:16,165 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening DocData,40682172,1273607630618 java.lang.OutOfMemoryError: Java heap space at java.io.BufferedInputStream.<init>(BufferedInputStream.java:178) at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1369) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1626) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1372) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:848) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:793) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:273) at org.apache.hadoop.hbase.regionserver.StoreFile.<init>(StoreFile.java:129) at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:410) at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:221) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1549) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:312) at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1564) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1531) at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1451) at java.lang.Thread.run(Thread.java:619) 2010-05-12 19:23:18,246 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, aborting. java.lang.OutOfMemoryError: Java heap space at java.io.BufferedInputStream.<init>(BufferedInputStream.java:178) at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1369) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1626) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1372) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:848) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:793) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:273) at org.apache.hadoop.hbase.regionserver.StoreFile.<init>(StoreFile.java:129) at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:410) at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:221) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1549) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:312) at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1564) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1531) at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1451) at java.lang.Thread.run(Thread.java:619) 2010-05-12 19:23:18,246 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=942, stores=9411, storefiles=19887, storefileIndexSize=182, memstoreSize=0, compactionQueueSize=0, usedHeap=2999, maxHeap=2999, blockCacheSize=8847696, blockCacheFree=1878235312, blockCacheCount=1, blockCacheHitRatio=0, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0 2010-05-12 19:23:18,247 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting 2010-05-12 19:23:18,254 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020: exiting 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020: exiting 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020: exiting 2010-05-12 19:23:18,255 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020: exiting And so on (The region server has a total of 100 handlers)..