[
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135250#comment-15135250
]
stack commented on HBASE-14790:
-------------------------------
I gave it a try on a little cluster. Not all the metrics are hooked up and it
looks like the patched version did 1/10th less operations. That said, the patch
looks to be 2x faster on the simple WALPE test with default args. it completes
in half the time with half the CS, half the branch-misses, with a better
instruction per cycle count. Patch is looking promising. Let me know when you'd
like a bit of review [~Apache9]
In both cases, did this against a cluster of nine nodes:
{code}
$ perf stat ./hbase/bin/hbase --config ~/conf_hbase
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation
{code}
Here is the end of the log from the UNPATCHED run (master branch as of now):
{code}
...
-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
count = 1000000
min = 513398
max = 2166155
mean = 721937.32
stddev = 97572.13
median = 717876.00
75% <= 775738.00
95% <= 869750.00
98% <= 912876.00
99% <= 950262.00
99.9% <= 1314831.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
count = 1000008
min = 1
max = 1
mean = 1.00
stddev = 0.00
median = 1.00
75% <= 1.00
95% <= 1.00
98% <= 1.00
99% <= 1.00
99.9% <= 1.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
count = 1000008
min = 453139
max = 1871512
mean = 639040.49
stddev = 88377.41
median = 633963.00
75% <= 689163.00
95% <= 764942.00
98% <= 805957.00
99% <= 836714.00
99.9% <= 1871512.00
-- Meters
----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
count = 557000000
mean rate = 763472.57 events/second
1-minute rate = 763802.93 events/second
5-minute rate = 699050.48 events/second
15-minute rate = 424725.42 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
count = 1000008
mean rate = 1370.69 events/second
1-minute rate = 1371.30 events/second
5-minute rate = 1255.04 events/second
15-minute rate = 762.53 events/second
2016-02-05 15:08:25,746 INFO [WALPerfEval.logRoller] regionserver.LogRoller:
LogRoller exiting.
Performance counter stats for './hbase/bin/hbase --config
/home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation':
322385.687852 task-clock # 0.438 CPUs utilized
8,158,772 context-switches # 0.025 M/sec
575,922 cpu-migrations # 0.002 M/sec
3,064,079 page-faults # 0.010 M/sec
474,704,238,311 cycles # 1.472 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
204,341,980,679 instructions # 0.43 insns per cycle
36,933,173,991 branches # 114.562 M/sec
787,891,661 branch-misses # 2.13% of all branches
735.817239034 seconds time elapsed
{code}
Here is the end of the log from the PATCHED run:
{code}
...
-- Histograms ------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
count = 901306
min = 156502
max = 364739
mean = 231465.79
stddev = 25166.33
median = 227923.00
75% <= 246449.00
95% <= 274560.00
98% <= 288228.00
99% <= 307981.00
99.9% <= 364739.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
count = 0
min = 0
max = 0
mean = 0.00
stddev = 0.00
median = 0.00
75% <= 0.00
95% <= 0.00
98% <= 0.00
99% <= 0.00
99.9% <= 0.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
count = 0
min = 0
max = 0
mean = 0.00
stddev = 0.00
median = 0.00
75% <= 0.00
95% <= 0.00
98% <= 0.00
99% <= 0.00
99.9% <= 0.00
-- Meters ----------------------------------------------------------------------
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
count = 502038025
mean rate = 2283690.25 events/second
1-minute rate = 2268881.93 events/second
5-minute rate = 1180418.70 events/second
15-minute rate = 486667.36 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
count = 0
mean rate = 0.00 events/second
1-minute rate = 0.00 events/second
5-minute rate = 0.00 events/second
15-minute rate = 0.00 events/second
Performance counter stats for './hbase/bin/hbase --config
/home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation':
256123.078857 task-clock # 1.030 CPUs utilized
3,359,633 context-switches # 0.013 M/sec
65,083 cpu-migrations # 0.254 K/sec
3,144,010 page-faults # 0.012 M/sec
412,859,061,607 cycles # 1.612 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
199,715,055,614 instructions # 0.48 insns per cycle
36,232,946,697 branches # 141.467 M/sec
461,875,115 branch-misses # 1.27% of all branches
248.692785230 seconds time elapsed
{code}
> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
> Issue Type: Improvement
> Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all
> purposes. But in fact, we do not need most of the features if we only want to
> log WAL. For example, we do not need pipeline recovery since we could just
> close the old logger and open a new one. And also, we do not need to write
> multiple blocks since we could also open a new logger if the old file is too
> large.
> And the most important thing is that, it is hard to handle all the corner
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when
> using original DFSOutputStream due to its complicated logic. And the
> complicated logic also force us to use some magical tricks to increase
> performance. For example, we need to use multiple threads to call {{hflush}}
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when
> logging WAL. For correctness, and also for performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)