On Wed, Nov 5, 2008 at 11:52 PM, Billy Pearson <[EMAIL PROTECTED]>wrote: > > > I ran a job on 80 mapfile to write 80 new file with non compressed indexes > and still took ~4X the memory of the sizes of the uncompressed index files > to load in to memory
Sorry Billy, how did you specify non-compressed indices? What took 4X memory? The non-compressed index? could have to do with the way they grow the arrays storing the pos of the > keys starting on line 333 > Looks like they are copying arrays and making a new one 150% bigger then > the last as needed. > not sure about java and how long before the old array will be recovered > from memory. I have seen a few times recover do to about ~2x the size of the uncompressed > index files but only twice. > Unreferenced java objects will be let go variously. Depends on your JVM configuration. Usually they'll be let go when JVM needs the memory (Links like this may be of help: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#par_gc.oom ) > I am testing by creating the files with a MR job and then loading the map > files in a simple > program to open the files and find midkey so the index gets read in to > memory and watching top command > also added -Xloggc:/tmp/gc.log and watch the memory usage go up and it > matches for the most part with top. > > I tried running System.gc() to force a clean up of the memory but did not > seam to help any. > Yeah, its just a suggestion. The gc.log should give you better clue of whats going on. Whats it saying? Lots of small gcs and then a Fulll gc every so often? Is the heap discernibly growing? You could enable the JMX for the JVM and connect with jconsole. This can give you a more detailed picture on heap. St.Ack P.S. Check out HBASE-722 if you have a sec. > Billy > > > "Billy Pearson" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > > I been looking over the MapFile class on hadoop for memory problems and >> thank I might have found an index bug >> >> org.apache.hadoop.io.MapFile >> line 202 >> if (size % indexInterval == 0) { // add an index entry >> >> this is where its writing the index and skipping every indexInterval rows >> >> then on the loading of the index >> line 335 >> >> if (skip > 0) { >> skip--; >> continue; // skip this entry >> >> we are only reading in every skip entry >> >> so with the default of 32 I thank in hbase we are only writing a index to >> the indexfile every 32 rows and then only reading back every 32 rows of that >> >> so we only get a index row every 1024 rows. >> >> Take a look and confirm and we can open a bug on hadoop about it. >> >> Billy >> >> >> > >
