Re: Map File index bug?

Billy Pearson Wed, 05 Nov 2008 23:56:34 -0800

Looks like the skip and indexInterval is setup to read correctly I did notunderstand that there was to conf options one for write and read.

The index files are block compressed Stack was not sure but found the codethere always compressed.I ran a job on 80 mapfile to write 80 new file with non compressed indexesand still took ~4X the memory of the sizes of the uncompressed index filesto load in to memorycould have to do with the way they grow the arrays storing the pos of thekeys starting on line 333Looks like they are copying arrays and making a new one 150% bigger then thelast as needed.not sure about java and how long before the old array will be recovered frommemory.I have seen a few times recover do to about ~2x the size of the uncompressedindex files but only twice.

I am testing by creating the files with a MR job and then loading the mapfiles in a simpleprogram to open the files and find midkey so the index gets read in tomemory and watching top commandalso added -Xloggc:/tmp/gc.log and watch the memory usage go up and itmatches for the most part with top.

I tried running System.gc() to force a clean up of the memory but did notseam to help any.


Billy

"Billy Pearson" <[EMAIL PROTECTED]>wrote in message news:[EMAIL PROTECTED]

I been looking over the MapFile class on hadoop for memory problems andthank I might have found an index bug
org.apache.hadoop.io.MapFile
line 202
if (size % indexInterval == 0) {            // add an index entry

this is where its writing the index and skipping every indexInterval rows

then on the loading of the index
line 335

         if (skip > 0) {
           skip--;
           continue;                             // skip this entry

we are only reading in every skip entry
so with the default of 32 I thank in hbase we are only writing a index tothe indexfile every 32 rows and then only reading back every 32 rows ofthat
so we only get a index row every 1024 rows.

Take a look and confirm and we can open a bug on hadoop about it.

Billy

Re: Map File index bug?

Reply via email to