Billy Pearson wrote:
hey guys there is a var in hadoop that can help with out having to
change the index int its io.map.index.skip
this can be changed to lower memory usage without having to wait until
the map files are compacted again and you can change as needed.
Yeah.
Here is what our hbase.io.index.interval property currently says:
<property>
<name>hbase.io.index.interval</name>
<value>128</value>
<description>The interval at which we record offsets in hbase
store files/mapfiles. Default for stock mapfiles is 128. Index
files are read into memory. If there are many of them, could prove
a burden. If so play with the hadoop io.map.index.skip property and
skip every nth index member when reading back the index into memory.
Downside to high index interval is lowered access times.
</description>
</property>
One approach might have been to keep writing at a small interval and
then set index.skip to specify how much of the index to read in.
St.Ack
"stack" <[email protected]> wrote in message
news:[email protected]...
Andrew Purtell wrote:
Based on this, leaving the Hadoop default (128) might be the
way to go.
Sounds good. I made HBASE-1070 to do above for TRUNK and branch.
Later, maybe it would make sense to dynamically set the index
interval based on the distribution of cell sizes in the mapfile at
some future time, according to some parameterized
formula that could be adjusted with config variable(s). This
could be done during compaction. Would make sense to also
consider the distribution of key lengths. Or there could be
other similar tricks implemented to keep index sizes down.
Made HBASE-1071. We should be able to do it at flush time, not just
compacting, since we have count of keys and could keep running tally
on memcache insert of notable attributes of key so we had these to
plugin to the formula at flush time.
In my opinion, for 0.20.0, MapFile should be brought local so
we can begin hacking it over time into a new file format. I
was thinking that designing and/or implementing a wholly new
format such as TFile would block I/O improvements for a long
time.
I don't know. This would be the safer tack for sure, but lets at
least keep open the possibility of our moving to a completely new
file format in 0.20.0 timeframe.
St.Ack