[
https://issues.apache.org/jira/browse/HBASE-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613566#action_12613566
]
LN commented on HBASE-745:
--------------------------
memory calculating:
memory usage of a regionserver is determined by 3 things:
#1. the mapfile index read into memory(io.map.index.skip can adjust it, buf
allwill stay in mem weather u need it or not)
#2. data output buffer used by each SequenceFile$Reader(each can measured as
the largest value size in the file)
#3. memcache, controlled by 'globalMemcacheLimit' and
'globalMemcacheLimitLowMark'
that is, beside already controlled #3, memory is determined by 'concurrent
opening' mapfiles(in fact, opening SequenceFiles of mapfile data).
in HBASE-24, stack advicing control open region number or open mapfile reader
number, i'd prefer contorlling opened mapfile reader directly, the core of
regionserver resource usage.
my suggestions of regionserver memory:
1. upgrade to hadoop 0.17.1(there's only one line incompatible with hadoop
0.17.1 in hbase 0.1.3, i'll file a issue seprately.), HADOOP-2346 resolved out
of connection/thread in DataNode, using read/write timeout.
2. set globalMemcacheLimit to a lower size, if ur application didn't read
recently inserted records frequently.
3. implment a MonitoredMapFileReader, it extends MapFile.reader, control
cocurrent opening instances use LRU, checkin/checkout in every MapFile.Reader
method. make HStoreFile.HbaseMapFile.HbaseReader extends MonitoredMapFileReader.
further more the release 0.1.3, i think hbase need a interface like
HStoreFileReader for abstracting file reading method, that will make open
reader controlling more easier.
> scaling of one regionserver, improving memory and cpu usage
> -----------------------------------------------------------
>
> Key: HBASE-745
> URL: https://issues.apache.org/jira/browse/HBASE-745
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 0.1.3
> Environment: hadoop 0.17.1
> Reporter: LN
> Priority: Minor
>
> after weeks testing hbase 0.1.3 and hadoop(0.16.4, 0.17.1), i found there are
> many works to do, before a particular regionserver can handle data about
> 100G, or even more. i'd share my opions here with stack, and other developers.
> first, the easiest way improving scalability of regionserver is upgrading
> hardware, use 64bit os and 8G memory for the regionserver process, and speed
> up disk io.
> besides hardware, following are software bottlenecks i found in regionserver:
> 1. as data increasing, compaction was eating cpu(with io) times, the total
> compaction time is basicly linear relative to whole data size, even worse,
> sometimes square relavtive to that size.
> 2. memory and socket connection usage are depends on opened mapfiles, see
> HADOOP-2341 and HBASE-24.
> will explain above in comments later.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.