[
https://issues.apache.org/jira/browse/HBASE-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366100#comment-14366100
]
Zee Chen commented on HBASE-13259:
----------------------------------
Test results under following conditions:
- 22 byte key to 32 byte val map stored in a table, 16k hfile blocksize
- uniform key distribution, tested with gets from large number of client threads
- hbase.regionserver.handler.count=100
- hbase.bucketcache.size=70000 (70GB)
- hbase.bucketcache.combinedcache.enabled=true
- hbase.bucketcache.ioengine=mmap:/dev/shm/bucketcache.0
-
hbase.bucketcache.bucket.sizes=5120,7168,9216,11264,13312,17408,33792,41984,50176,58368,66560,99328,132096,197632,263168,394240,525312
- CMS GC
At 85k gets per second, the system looks like:
{code}
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
58 11 26 0 0 5| 0 16k| 17M 13M| 0 0 | 316k 255k
59 11 25 0 0 5|2048k 12k| 18M 13M| 0 0 | 319k 254k
58 11 25 0 0 5| 0 28k| 18M 13M| 0 0 | 318k 253k
59 11 25 0 0 5|2048k 0 | 18M 13M| 0 0 | 318k 252k
{code}
with wire latency profile (unit is microsecond):
{code}
Quantile: 0.500000, Value: 361
Quantile: 0.750000, Value: 555
Quantile: 0.900000, Value: 830
Quantile: 0.950000, Value: 1077
Quantile: 0.980000, Value: 1604
Quantile: 0.990000, Value: 4212
Quantile: 0.999000, Value: 7221
Quantile: 1.000000, Value: 14406
{code}
FileIOEngine's latency profile is identical. It had higher sys CPU and lower
user CPU, higher context switches, and about 40% lower max throughput in gets
per second.
The patch was tested to 140k gets per second for 2 weeks nonstop.
> mmap() based BucketCache IOEngine
> ---------------------------------
>
> Key: HBASE-13259
> URL: https://issues.apache.org/jira/browse/HBASE-13259
> Project: HBase
> Issue Type: New Feature
> Components: BlockCache
> Affects Versions: 0.98.10
> Reporter: Zee Chen
> Fix For: 2.2.0
>
> Attachments: HBASE-13259-v2.patch, HBASE-13259.patch, ioread-1.svg,
> mmap-0.98-v1.patch, mmap-1.svg, mmap-trunk-v1.patch
>
>
> Of the existing BucketCache IOEngines, FileIOEngine uses pread() to copy data
> from kernel space to user space. This is a good choice when the total working
> set size is much bigger than the available RAM and the latency is dominated
> by IO access. However, when the entire working set is small enough to fit in
> the RAM, using mmap() (and subsequent memcpy()) to move data from kernel
> space to user space is faster. I have run some short keyval gets tests and
> the results indicate a reduction of 2%-7% of kernel CPU on my system,
> depending on the load. On the gets, the latency histograms from mmap() are
> identical to those from pread(), but peak throughput is close to 40% higher.
> This patch modifies ByteByfferArray to allow it to specify a backing file.
> Example for using this feature: set hbase.bucketcache.ioengine to
> mmap:/dev/shm/bucketcache.0 in hbase-site.xml.
> Attached perf measured CPU usage breakdown in flames graph.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)