Billy Pearson wrote:
hey guys there is a var in hadoop that can help with out having to change the index int its io.map.index.skip this can be changed to lower memory usage without having to wait until the map files are compacted again and you can change as needed.

Yeah.

Here is what our hbase.io.index.interval property currently says:

 <property>
   <name>hbase.io.index.interval</name>
   <value>128</value>
   <description>The interval at which we record offsets in hbase
   store files/mapfiles.  Default for stock mapfiles is 128.  Index
   files are read into memory.  If there are many of them, could prove
   a burden.  If so play with the hadoop io.map.index.skip property and
   skip every nth index member when reading back the index into memory.
   Downside to high index interval is lowered access times.
   </description>
 </property>

One approach might have been to keep writing at a small interval and then set index.skip to specify how much of the index to read in.

St.Ack


"stack" <[email protected]> wrote in message news:[email protected]...
Andrew Purtell wrote:
Based on this, leaving the Hadoop default (128) might be the
way to go.

Sounds good.  I made HBASE-1070 to do above for TRUNK and branch.
Later, maybe it would make sense to dynamically set the index
interval based on the distribution of cell sizes in the mapfile at some future time, according to some parameterized
formula that could be adjusted with config variable(s). This
could be done during compaction. Would make sense to also
consider the distribution of key lengths. Or there could be
other similar tricks implemented to keep index sizes down.

Made HBASE-1071. We should be able to do it at flush time, not just compacting, since we have count of keys and could keep running tally on memcache insert of notable attributes of key so we had these to plugin to the formula at flush time.

In my opinion, for 0.20.0, MapFile should be brought local so
we can begin hacking it over time into a new file format. I
was thinking that designing and/or implementing a wholly new
format such as TFile would block I/O improvements for a long
time.

I don't know. This would be the safer tack for sure, but lets at least keep open the possibility of our moving to a completely new file format in 0.20.0 timeframe.

St.Ack




Reply via email to