yes I just hacked so I could see what size it was with out compression so I could compair with what its takeing in memory

Billy

"stack" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
But its decompressed when its read into memory right? So size will be the same in memory whether it was compressed in the filesystem, or not? Or am I
missing something Billy?
St.Ack

On Thu, Nov 6, 2008 at 7:55 AM, Billy Pearson <[EMAIL PROTECTED]>wrote:

There is no method to change the compression of the index its just always
block compressed.
I hacked the code and and changed to non compressed so I could get a size
of the index with out compression.
Opening the all 80 mapfiles took 4x the memory then there uncompressed size
of all the index files.


"stack" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]

 On Wed, Nov 5, 2008 at 11:52 PM, Billy Pearson
<[EMAIL PROTECTED]>wrote:



I ran a job on 80 mapfile to write 80 new file with non compressed
indexes
and still took ~4X the memory of the sizes of the uncompressed index
files
to load in to memory



Sorry Billy, how did you specify non-compressed indices?  What took 4X
memory?  The non-compressed index?


could have to do with the way they grow the arrays storing the pos of the

keys starting on line 333
Looks like they are copying arrays and making a new one 150% bigger then
the last as needed.
not sure about java and how long before the old array will be recovered
from memory.


I have seen a few times recover do to about ~2x the size of the
uncompressed

index files but only twice.



Unreferenced java objects will be let go variously. Depends on your JVM configuration. Usually they'll be let go when JVM needs the memory (Links
like this may be of help:

http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#par_gc.oom
)



I am testing by creating the files with a MR job and then loading the map
files in a simple
program to open the files and find midkey so the index gets read in to
memory and watching top command
also added -Xloggc:/tmp/gc.log and watch the memory usage go up and it
matches for the most part with top.

I tried running System.gc() to force a clean up of the memory but did not
seam to help any.


Yeah, its just a suggestion.  The gc.log should give you better clue of
whats going on.  Whats it saying?  Lots of small gcs and then a Fulll gc
every so often?  Is the heap discernibly growing?  You could enable the
JMX
for the JVM and connect with jconsole. This can give you a more detailed
picture on heap.

St.Ack
P.S. Check out HBASE-722 if you have a sec.



 Billy


"Billy Pearson" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]

I been looking over the MapFile class on hadoop for memory problems and

thank I might have found an index bug

org.apache.hadoop.io.MapFile
line 202
if (size % indexInterval == 0) {            // add an index entry

this is where its writing the index and skipping every indexInterval
rows

then on the loading of the index
line 335

       if (skip > 0) {
         skip--;
         continue;                             // skip this entry

we are only reading in every skip entry

so with the default of 32 I thank in hbase we are only writing a index
to
the indexfile every 32 rows and then only reading back every 32 rows of
that

so we only get a index row every 1024 rows.

Take a look and confirm and we can open a bug on hadoop about it.

Billy












Reply via email to