On Tue, Feb 21, 2012 at 9:29 PM, M. C. Srivas <[email protected]> wrote: > Yes, that was my thinking --- to do a major compaction the region-server > would have to load all the flushed files for that region, merge them, and > then write out the new region. If the region-file was 20g in size, the > region-server would require well over 20g of heap space to do this work. Am > I completely off? >
You are a little off. We open all hfiles and then stream through each of them doing a merge sort streaming the outputting to the new compacted file. Here is where we open a scanner on all the files to compact and then as we inch through, we figure what to write to the output: http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#1393 (Its a bit hard to follow whats going on -- file selection is done already higher up in call chain). St.Ack
