[ 
https://issues.apache.org/jira/browse/HBASE-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366100#comment-14366100
 ] 

Zee Chen commented on HBASE-13259:
----------------------------------

Test results under following conditions:

- 22 byte key to 32 byte val map stored in a table, 16k hfile blocksize
- uniform key distribution, tested with gets from large number of client threads
- hbase.regionserver.handler.count=100
- hbase.bucketcache.size=70000 (70GB)
- hbase.bucketcache.combinedcache.enabled=true
- hbase.bucketcache.ioengine=mmap:/dev/shm/bucketcache.0
- 
hbase.bucketcache.bucket.sizes=5120,7168,9216,11264,13312,17408,33792,41984,50176,58368,66560,99328,132096,197632,263168,394240,525312
- CMS GC

At 85k gets per second, the system looks like:

{code}
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
 58  11  26   0   0   5|   0    16k|  17M   13M|   0     0 | 316k  255k
 59  11  25   0   0   5|2048k   12k|  18M   13M|   0     0 | 319k  254k
 58  11  25   0   0   5|   0    28k|  18M   13M|   0     0 | 318k  253k
 59  11  25   0   0   5|2048k    0 |  18M   13M|   0     0 | 318k  252k
{code}

with wire latency profile (unit is microsecond):
{code}
Quantile: 0.500000, Value: 361
Quantile: 0.750000, Value: 555
Quantile: 0.900000, Value: 830
Quantile: 0.950000, Value: 1077
Quantile: 0.980000, Value: 1604
Quantile: 0.990000, Value: 4212
Quantile: 0.999000, Value: 7221
Quantile: 1.000000, Value: 14406
{code}

FileIOEngine's latency profile is identical. It had higher sys CPU and lower 
user CPU, higher context switches, and about 40% lower max throughput in gets 
per second.

The patch was tested to 140k gets per second for 2 weeks nonstop.

> mmap() based BucketCache IOEngine
> ---------------------------------
>
>                 Key: HBASE-13259
>                 URL: https://issues.apache.org/jira/browse/HBASE-13259
>             Project: HBase
>          Issue Type: New Feature
>          Components: BlockCache
>    Affects Versions: 0.98.10
>            Reporter: Zee Chen
>             Fix For: 2.2.0
>
>         Attachments: HBASE-13259-v2.patch, HBASE-13259.patch, ioread-1.svg, 
> mmap-0.98-v1.patch, mmap-1.svg, mmap-trunk-v1.patch
>
>
> Of the existing BucketCache IOEngines, FileIOEngine uses pread() to copy data 
> from kernel space to user space. This is a good choice when the total working 
> set size is much bigger than the available RAM and the latency is dominated 
> by IO access. However, when the entire working set is small enough to fit in 
> the RAM, using mmap() (and subsequent memcpy()) to move data from kernel 
> space to user space is faster. I have run some short keyval gets tests and 
> the results indicate a reduction of 2%-7% of kernel CPU on my system, 
> depending on the load. On the gets, the latency histograms from mmap() are 
> identical to those from pread(), but peak throughput is close to 40% higher.
> This patch modifies ByteByfferArray to allow it to specify a backing file.
> Example for using this feature: set  hbase.bucketcache.ioengine to 
> mmap:/dev/shm/bucketcache.0 in hbase-site.xml.
> Attached perf measured CPU usage breakdown in flames graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to