Re: Region server memory requirements

stack Mon, 22 Dec 2008 11:59:09 -0800

Billy Pearson wrote:

hey guys there is a var in hadoop that can help with out having tochange the index int its io.map.index.skipthis can be changed to lower memory usage without having to wait untilthe map files are compacted again and you can change as needed.


Yeah.

Here is what our hbase.io.index.interval property currently says:

 <property>
   <name>hbase.io.index.interval</name>
   <value>128</value>
   <description>The interval at which we record offsets in hbase
   store files/mapfiles.  Default for stock mapfiles is 128.  Index
   files are read into memory.  If there are many of them, could prove
   a burden.  If so play with the hadoop io.map.index.skip property and
   skip every nth index member when reading back the index into memory.
   Downside to high index interval is lowered access times.
   </description>
 </property>

One approach might have been to keep writing at a small interval andthen set index.skip to specify how much of the index to read in.


St.Ack

"stack" <[email protected]> wrote in messagenews:[email protected]...
Andrew Purtell wrote:
Based on this, leaving the Hadoop default (128) might be the
way to go.
Sounds good.  I made HBASE-1070 to do above for TRUNK and branch.
Later, maybe it would make sense to dynamically set the index
interval based on the distribution of cell sizes in the mapfile atsome future time, according to some parameterized
formula that could be adjusted with config variable(s). This
could be done during compaction. Would make sense to also
consider the distribution of key lengths. Or there could be
other similar tricks implemented to keep index sizes down.
Made HBASE-1071. We should be able to do it at flush time, not justcompacting, since we have count of keys and could keep running tallyon memcache insert of notable attributes of key so we had these toplugin to the formula at flush time.
In my opinion, for 0.20.0, MapFile should be brought local so
we can begin hacking it over time into a new file format. I
was thinking that designing and/or implementing a wholly new
format such as TFile would block I/O improvements for a long
time.
I don't know. This would be the safer tack for sure, but lets atleast keep open the possibility of our moving to a completely newfile format in 0.20.0 timeframe.
St.Ack

Re: Region server memory requirements

Reply via email to