[ 
https://issues.apache.org/jira/browse/HBASE-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613557#action_12613557
 ] 

LN commented on HBASE-745:
--------------------------

compaction improvement:

compaction has very poor efficiency in current hbase release(0.1.3), suppose 3 
mapfile in a HStore, the 1 orginal is 128M, and newly flushed 2 is smaller than 
1M(this is the most common situation where regionserver carrying 512 hstore or 
more, flushing 256M global mamcache each time), we compacted 2M data, but read 
and write 120M!

my suggestion:
1. set threshold larger, this will cause lower compaction times, but more 
mapfiles(will discuss later in this issue about memory usage)
2. implementing incremental compaction, that's mean: don't compact to 1 file 
each time, compact small files only, 
do a whole compaction when file size large enough. in HStore#compact(boolean), 
we can use a alorighm to select hstorefiles for compacting. (will attach my 
impl for review later.)


> scaling of one regionserver, improving memory and cpu usage
> -----------------------------------------------------------
>
>                 Key: HBASE-745
>                 URL: https://issues.apache.org/jira/browse/HBASE-745
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.3
>         Environment: hadoop 0.17.1
>            Reporter: LN
>            Priority: Minor
>
> after weeks testing hbase 0.1.3 and hadoop(0.16.4, 0.17.1), i found there are 
> many works to do,  before a particular regionserver can handle data about 
> 100G, or even more. i'd share my opions here with stack, and other developers.
> first, the easiest way improving scalability of regionserver is upgrading 
> hardware, use 64bit os and 8G memory for the regionserver process, and speed 
> up disk io. 
> besides hardware, following are software bottlenecks i found in regionserver:
> 1. as data increasing, compaction was eating cpu(with io) times, the total 
> compaction time is basicly linear relative to whole data size, even worse, 
> sometimes square relavtive to that size.
> 2. memory and socket connection usage are depends on opened mapfiles, see 
> HADOOP-2341 and HBASE-24. 
> will explain above in comments later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to