I been looking over the MapFile class on hadoop for memory problems and thank I might have found an index bug

org.apache.hadoop.io.MapFile
line 202
if (size % indexInterval == 0) {            // add an index entry

this is where its writing the index and skipping every indexInterval rows

then on the loading of the index
line 335

         if (skip > 0) {
           skip--;
           continue;                             // skip this entry

we are only reading in every skip entry

so with the default of 32 I thank in hbase we are only writing a index to the indexfile every 32 rows and then only reading back every 32 rows of that

so we only get a index row every 1024 rows.

Take a look and confirm and we can open a bug on hadoop about it.

Billy

Reply via email to