Re: Scan performance on a big table as combination of multiple logic tables

Stack Tue, 21 Feb 2012 21:59:06 -0800

On Tue, Feb 21, 2012 at 9:29 PM, M. C. Srivas <[email protected]> wrote:
> Yes,  that was my thinking ---  to do a major compaction  the region-server
> would have to load all the flushed files for that region, merge them, and
> then write out the new region. If the region-file was 20g in size, the
> region-server would require well over 20g of heap space to do this work. Am
> I completely off?
>


You are a little off.  We open all hfiles and then stream through each
of them doing a merge sort streaming the outputting to the new
compacted file.

Here is where we open a scanner on all the files to compact and then
as we inch through, we figure what to write to the output:
http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#1393

(Its a bit hard to follow whats going on -- file selection is done
already higher up in call chain).

St.Ack

Re: Scan performance on a big table as combination of multiple logic tables

Reply via email to