Re: Map File index bug?

stack Thu, 06 Nov 2008 00:27:20 -0800

On Wed, Nov 5, 2008 at 11:52 PM, Billy Pearson
<[EMAIL PROTECTED]>wrote:
>
>
> I ran a job on 80 mapfile to write 80 new file with non compressed indexes
> and still took ~4X the memory of the sizes of the uncompressed index files
> to load in to memory



Sorry Billy, how did you specify non-compressed indices?  What took 4X
memory?  The non-compressed index?


could have to do with the way they grow the arrays storing the pos of the
> keys starting on line 333
> Looks like they are copying arrays and making a new one 150% bigger then
> the last as needed.
> not sure about java and how long before the old array will be recovered
> from memory.

I have seen a few times recover do to about ~2x the size of the uncompressed
> index files but only twice.
>


Unreferenced java objects will be let go variously.  Depends  on your JVM
configuration.  Usually they'll be let go when JVM needs the memory (Links
like this may be of help:
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#par_gc.oom
)



> I am testing by creating the files with a MR job and then loading the map
> files in a simple
> program to open the files and find midkey so the index gets read in to
> memory and watching top command
> also added -Xloggc:/tmp/gc.log and watch the memory usage go up and it
> matches for the most part with top.
>
> I tried running System.gc() to force a clean up of the memory but did not
> seam to help any.
>

Yeah, its just a suggestion.  The gc.log should give you better clue of
whats going on.  Whats it saying?  Lots of small gcs and then a Fulll gc
every so often?  Is the heap discernibly growing?  You could enable the JMX
for the JVM and connect with jconsole.  This can give you a more detailed
picture on heap.

 St.Ack
P.S. Check out HBASE-722 if you have a sec.



> Billy
>
>
> "Billy Pearson" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>
>  I been looking over the MapFile class on hadoop for memory problems and
>> thank I might have found an index bug
>>
>> org.apache.hadoop.io.MapFile
>> line 202
>> if (size % indexInterval == 0) {            // add an index entry
>>
>> this is where its writing the index and skipping every indexInterval rows
>>
>> then on the loading of the index
>> line 335
>>
>>         if (skip > 0) {
>>           skip--;
>>           continue;                             // skip this entry
>>
>> we are only reading in every skip entry
>>
>> so with the default of 32 I thank in hbase we are only writing a index to
>> the indexfile every 32 rows and then only reading back every 32 rows of that
>>
>> so we only get a index row every 1024 rows.
>>
>> Take a look and confirm and we can open a bug on hadoop about it.
>>
>> Billy
>>
>>
>>
>
>

Re: Map File index bug?

Reply via email to