Andrew Purtell wrote:
Based on this, leaving the Hadoop default (128) might be the
way to go.

Sounds good.  I made HBASE-1070 to do above for TRUNK and branch.
Later, maybe it would make sense to dynamically set the index
interval based on the distribution of cell sizes in the mapfile at some future time, according to some parameterized
formula that could be adjusted with config variable(s). This
could be done during compaction. Would make sense to also
consider the distribution of key lengths. Or there could be
other similar tricks implemented to keep index sizes down.

Made HBASE-1071. We should be able to do it at flush time, not just compacting, since we have count of keys and could keep running tally on memcache insert of notable attributes of key so we had these to plugin to the formula at flush time.

In my opinion, for 0.20.0, MapFile should be brought local so
we can begin hacking it over time into a new file format. I
was thinking that designing and/or implementing a wholly new
format such as TFile would block I/O improvements for a long
time.
I don't know. This would be the safer tack for sure, but lets at least keep open the possibility of our moving to a completely new file format in 0.20.0 timeframe.

St.Ack

Reply via email to