I been looking over the MapFile class on hadoop for memory problems and
thank I might have found an index bug
org.apache.hadoop.io.MapFile
line 202
if (size % indexInterval == 0) { // add an index entry
this is where its writing the index and skipping every indexInterval rows
then on the loading of the index
line 335
if (skip > 0) {
skip--;
continue; // skip this entry
we are only reading in every skip entry
so with the default of 32 I thank in hbase we are only writing a index to
the indexfile every 32 rows and then only reading back every 32 rows of that
so we only get a index row every 1024 rows.
Take a look and confirm and we can open a bug on hadoop about it.
Billy