[
https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866315#comment-13866315
]
stack commented on HBASE-10156:
-------------------------------
I did comparison of trunk to this patch using [~fenghh]'s little reciepe on a
little cluster of four datanodes with the HLogPE running on separate namenode
machine. Here is the command:
$ for i in 1 3 5 10 25 50 100 200; do for j in 1; do perf stat ./bin/hbase
--config /home/stack/conf_hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads $i
-iterations 1000000 -keySize 50 -valueSize 100 &>
"/tmp/nopatch2${i}.${j}.txt"; done; done
The above runs too long. Next time will do 1/10th the iterations but here are
the numbers for the above:
||Clients||nopatch||withpatch||% diff||
|1|1229|1245|-1|
|3|1360|1350||0|
|5|1094|1057|-2|
|10|1384|1382|0|
|25|1521|1481|-3|
|50|1892|1484|-22|
|100|3009|1555|-48|
|200|5442|1851|-66|
One to five or ten client compare is probably of no value. The sync time
predominates. At 50 clients/handlers, this patch shows a definite improvement.
I've been measuring a bit over the last week or so and elapsed time is pretty
erratic as a measure given it is +/-10% or more between runs.
Here is another means of compare. Lets take the 50 clients/handlers case.
Here is what perf stats says about each run.
First no patch:
{code}
Performance counter stats for './bin/hbase --config /home/stack/conf_hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads 50
-iterations 1000000 -keySize 50 -valueSize 100':
4124147.969029 task-clock # 2.179 CPUs utilized
190,346,772 context-switches # 0.046 M/sec
9,156,832 CPU-migrations # 0.002 M/sec
48,592 page-faults # 0.012 K/sec
5,960,399,494,658 cycles # 1.445 GHz
[83.29%]
4,436,005,557,091 stalled-cycles-frontend # 74.42% frontend cycles idle
[83.31%]
2,724,792,625,322 stalled-cycles-backend # 45.71% backend cycles idle
[66.74%]
2,867,985,594,350 instructions # 0.48 insns per cycle
# 1.55 stalled cycles per insn
[83.41%]
524,067,855,577 branches # 127.073 M/sec
[83.32%]
17,507,214,729 branch-misses # 3.34% of all branches
[83.35%]
1892.292373360 seconds time elapsed
{code}
And with patch:
{code}
Performance counter stats for './bin/hbase --config /home/stack/conf_hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads 50
-iterations 1000000 -keySize 50 -valueSize 100':
3527447.057674 task-clock # 2.375 CPUs utilized
130,085,238 context-switches # 0.037 M/sec
20,806,185 CPU-migrations # 0.006 M/sec
52,903 page-faults # 0.015 K/sec
5,230,740,387,202 cycles # 1.483 GHz
[83.31%]
3,998,305,194,614 stalled-cycles-frontend # 76.44% frontend cycles idle
[83.34%]
2,469,198,968,381 stalled-cycles-backend # 47.21% backend cycles idle
[66.69%]
2,356,538,810,101 instructions # 0.45 insns per cycle
# 1.70 stalled cycles per insn
[83.35%]
417,045,222,142 branches # 118.229 M/sec
[83.33%]
15,652,040,814 branch-misses # 3.75% of all branches
[83.33%]
1484.963721267 seconds time elapsed
{code}
We do less cycles when the patch is in place.
It is more apparent for the 200 thread case. First nopatch:
{code}
Performance counter stats for './bin/hbase --config /home/stack/conf_hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads 200
-iterations 1000000 -keySize 50 -valueSize 100':
13926656.466201 task-clock # 2.559 CPUs utilized
589,974,965 context-switches # 0.042 M/sec
38,417,368 CPU-migrations # 0.003 M/sec
88,956 page-faults # 0.006 K/sec
20,510,293,875,760 cycles # 1.473 GHz
[83.30%]
15,492,686,319,443 stalled-cycles-frontend # 75.54% frontend cycles idle
[83.30%]
9,339,162,742,305 stalled-cycles-backend # 45.53% backend cycles idle
[66.72%]
9,529,936,978,732 instructions # 0.46 insns per cycle
# 1.63 stalled cycles per insn
[83.38%]
1,712,735,175,559 branches # 122.983 M/sec
[83.33%]
61,211,156,300 branch-misses # 3.57% of all branches
[83.35%]
5442.635880066 seconds time elapsed
{code}
... and now with the patch:
{code}
Performance counter stats for './bin/hbase --config /home/stack/conf_hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads 200
-iterations 1000000 -keySize 50 -valueSize 100':
8939605.570123 task-clock # 4.830 CPUs utilized
310,224,763 context-switches # 0.035 M/sec
97,373,084 CPU-migrations # 0.011 M/sec
69,187 page-faults # 0.008 K/sec
13,396,936,480,713 cycles # 1.499 GHz
[83.30%]
10,246,979,648,214 stalled-cycles-frontend # 76.49% frontend cycles idle
[83.32%]
5,721,780,980,395 stalled-cycles-backend # 42.71% backend cycles idle
[66.72%]
6,385,137,954,357 instructions # 0.48 insns per cycle
# 1.60 stalled cycles per insn
[83.38%]
1,115,071,041,675 branches # 124.734 M/sec
[83.33%]
36,286,432,631 branch-misses # 3.25% of all branches
[83.33%]
1851.022418695 seconds time elapsed
{code}
Less cycles, less context switches.
There is a bunch of uglyness in the above -- all those cpu migrations and how
high the stalls per instruction are, and the low throughput -- but saving
something for later.
It seems like this patch has ok latency relative. HLogPE has a few metrics.
Here are some from the run of 200 cilents toward the end. First no patch:
{code}
2014-01-08 15:40:12,715 DEBUG [LruStats #0] hfile.LruBlockCache: Total=6.52 MB,
free=6.20 GB, max=6.21 GB, blocks=0, accesses=0, hits=0, hitRatio=0,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=0, evicted=0,
evictedPerRun=NaN
2014-01-08 15:40:13,576 INFO [t29] wal.HLogPerformanceEvaluation: t29 took
5400.756s 185.159ops/s
1/8/14 3:40:13 PM ==============================================================
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation:
appendMeter:
count = 58863065272
mean rate = 10894.75 bytes/ms
1-minute rate = 11129.26 bytes/ms
5-minute rate = 10953.71 bytes/ms
15-minute rate = 10883.10 bytes/ms
syncMeter:
count = 30894418
mean rate = 5.72 syncs/ms
1-minute rate = 5.68 syncs/ms
5-minute rate = 5.71 syncs/ms
15-minute rate = 5.71 syncs/ms
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos-between-syncs:
syncHistogram:
min = -71842.00
max = 191525149.00
mean = 174964.59
stddev = 238420.37
median = 138823.50
75% <= 235886.00
95% <= 413183.25
98% <= 502136.48
99% <= 596775.15
99.9% <= 776253.14
{code}
and with patch:
{code}
1/8/14 1:10:34 AM ==============================================================
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation:
appendMeter:
count = 34714495550
mean rate = 18940.19 bytes/ms
1-minute rate = 19010.56 bytes/ms
5-minute rate = 18957.45 bytes/ms
15-minute rate = 17278.92 bytes/ms
syncMeter:
count = 7473900
mean rate = 4.08 syncs/ms
1-minute rate = 4.09 syncs/ms
5-minute rate = 4.08 syncs/ms
15-minute rate = 3.71 syncs/ms
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.countPerSync:
syncCountHistogram:
min = 0.00
max = 200.00
mean = 26.54
stddev = 9.90
median = 26.00
75% <= 32.00
95% <= 44.00
98% <= 48.00
99% <= 56.00
99.9% <= 99.25
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos:
latencyHistogram:
min = 581216.00
max = 715683611.00
mean = 1843910.40
stddev = 1045928.90
median = 1764587.50
75% <= 2041533.75
95% <= 2477104.15
98% <= 2919530.58
99% <= 5164260.06
99.9% <= 11252801.10
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation.nanos-between-syncs:
syncHistogram:
min = 35265.00
max = 715541537.00
mean = 1217613.73
stddev = 806193.57
median = 1159088.00
75% <= 1270570.00
95% <= 1503642.10
98% <= 1749445.80
99% <= 3164119.49
99.9% <= 6123406.00
{code}
Almost double the throughput but latency is not too bad -- about 2ms per write
with most of it sync time.
Let me work on polishing the patch.
> Fix up the HBASE-8755 slowdown when low contention
> --------------------------------------------------
>
> Key: HBASE-10156
> URL: https://issues.apache.org/jira/browse/HBASE-10156
> Project: HBase
> Issue Type: Sub-task
> Components: wal
> Reporter: stack
> Assignee: stack
> Attachments: 10156.txt, 10156v2.txt, 10156v3.txt, 10156v4.txt,
> Disrupting.java
>
>
> HBASE-8755 slows our writes when only a few clients. Fix.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)