[ 
https://issues.apache.org/jira/browse/HBASE-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613566#action_12613566
 ] 

LN commented on HBASE-745:
--------------------------

memory calculating:
memory usage of a regionserver is determined by 3 things:
#1. the mapfile index read into memory(io.map.index.skip can adjust it, buf 
allwill stay in mem weather u need it or not)
#2. data output buffer used by each SequenceFile$Reader(each can measured as 
the largest value size in the file)
#3. memcache, controlled by 'globalMemcacheLimit' and 
'globalMemcacheLimitLowMark'

that is, beside already controlled #3,  memory is determined by 'concurrent 
opening' mapfiles(in fact, opening SequenceFiles of mapfile data).

in HBASE-24, stack advicing control open region number or open mapfile reader 
number, i'd prefer contorlling opened mapfile reader directly, the core of 
regionserver resource usage. 

my suggestions of regionserver memory:
1. upgrade to hadoop 0.17.1(there's only one line incompatible with hadoop 
0.17.1 in hbase 0.1.3, i'll file a issue seprately.), HADOOP-2346 resolved out 
of connection/thread in DataNode, using read/write timeout.
2. set globalMemcacheLimit to a lower size, if ur application didn't read 
recently inserted records frequently.
3. implment a MonitoredMapFileReader, it extends MapFile.reader, control 
cocurrent opening instances use LRU, checkin/checkout in every MapFile.Reader 
method. make HStoreFile.HbaseMapFile.HbaseReader extends MonitoredMapFileReader.

further more the release 0.1.3, i think hbase need a interface like 
HStoreFileReader for abstracting file reading method, that will make open 
reader controlling more easier.

> scaling of one regionserver, improving memory and cpu usage
> -----------------------------------------------------------
>
>                 Key: HBASE-745
>                 URL: https://issues.apache.org/jira/browse/HBASE-745
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.3
>         Environment: hadoop 0.17.1
>            Reporter: LN
>            Priority: Minor
>
> after weeks testing hbase 0.1.3 and hadoop(0.16.4, 0.17.1), i found there are 
> many works to do,  before a particular regionserver can handle data about 
> 100G, or even more. i'd share my opions here with stack, and other developers.
> first, the easiest way improving scalability of regionserver is upgrading 
> hardware, use 64bit os and 8G memory for the regionserver process, and speed 
> up disk io. 
> besides hardware, following are software bottlenecks i found in regionserver:
> 1. as data increasing, compaction was eating cpu(with io) times, the total 
> compaction time is basicly linear relative to whole data size, even worse, 
> sometimes square relavtive to that size.
> 2. memory and socket connection usage are depends on opened mapfiles, see 
> HADOOP-2341 and HBASE-24. 
> will explain above in comments later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to