[ 
https://issues.apache.org/jira/browse/HBASE-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614429#action_12614429
 ] 

LN commented on HBASE-745:
--------------------------

maybe i'm hungering for hbase stronger:-) i know Robustness and Scalabilit(in 
order) are focused by 0.2 release. and "3TB of data on about ~50 nodes" means 
60G per regionserver, not very hard, each (default config) regionserver can 
handle 30G data on my testing server, by 0.1.3.

i'm trying to make regionserver handling more data, 1T? because i think the 
resource(memory, cpu) usage of a regionserver should not depends on existing 
data size, but active data size(read/write throughput). 

i think i found the bottlenecks(compaction eating cpu, open mapfiles eating 
memory), but NOT SURE my solution, so i paste here for review, esp. from Jim 
and Stack. 

here my 'total solution', i named it '0.1.3/0.17.1 scalability pack':
1. patch HBASE-749 for 0.17.1 compatible
2. patch HADOOP-3778 for a socket exception bug
3. HADOOP-3779 for concurrent connection limitation of datanode(patch not 
attached)
4. attached incremental compaction patch
5. a "open mapfile reader" limitaion patch, implemented my suggestion above, 
but looks not good, so havn't attach.

with above and adjusting some config properties, i have my regionserver 
handling about 400G data now, with about 15G testing write throughput per day.

 

> scaling of one regionserver, improving memory and cpu usage
> -----------------------------------------------------------
>
>                 Key: HBASE-745
>                 URL: https://issues.apache.org/jira/browse/HBASE-745
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.3, 0.2.0
>         Environment: hadoop 0.17.1
>            Reporter: LN
>            Priority: Minor
>         Attachments: HBASE-745.compact.patch
>
>
> after weeks testing hbase 0.1.3 and hadoop(0.16.4, 0.17.1), i found there are 
> many works to do,  before a particular regionserver can handle data about 
> 100G, or even more. i'd share my opions here with stack, and other developers.
> first, the easiest way improving scalability of regionserver is upgrading 
> hardware, use 64bit os and 8G memory for the regionserver process, and speed 
> up disk io. 
> besides hardware, following are software bottlenecks i found in regionserver:
> 1. as data increasing, compaction was eating cpu(with io) times, the total 
> compaction time is basicly linear relative to whole data size, even worse, 
> sometimes square relavtive to that size.
> 2. memory usage are depends on opened mapfiles
> 3. network connection are depends on opened mapfiles, see HADOOP-2341 and 
> HBASE-24. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to