[
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Himanshu Vashishtha updated HBASE-10278:
----------------------------------------
Attachment: SwitchWriterFlow.pptx
I have attached the flow diagram of the switching process. In this, I summarize
the current trunk behaviour, what does the Switching add, and the steps
involved and invariants maintained while doing the switching.
Chatting with Stack, he suggested to measure the costs of adding one level of
indirection b/w SyncRunners (SR) and writer.sync() (this patch adds a thread
pool in order to monitor SR and interrupt them to release them from current
sync() call.
Attached are perf stat numbers on 5 node cluster (hadoop2.2) with trunk and
patch + trunk.
{code}
hbase-0.99.0-SNAPSHOT/bin/hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -iterations
1000000 -threads 10 ; done
Trunk:
Performance counter stats for
'/home/himanshu/dists/hbase-0.99.0-SNAPSHOT/bin/hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -iterations
1000000 -threads 10':
1891960.295558 task-clock # 2.396 CPUs utilized
55,076,890 context-switches # 0.029 M/sec
1,770,901 CPU-migrations # 0.936 K/sec
73,650 page-faults # 0.039 K/sec
2,853,602,378,588 cycles # 1.508 GHz
[83.32%]
2,126,410,331,760 stalled-cycles-frontend # 74.52% frontend cycles idle
[83.31%]
1,274,582,986,073 stalled-cycles-backend # 44.67% backend cycles idle
[66.72%]
1,511,777,502,744 instructions # 0.53 insns per cycle
# 1.41 stalled cycles per insn
[83.37%]
264,303,859,957 branches # 139.698 M/sec
[83.33%]
7,946,652,758 branch-misses # 3.01% of all branches
[83.33%]
789.767027189 seconds time elapsed
WITH PATCH:
Performance counter stats for
'/home/himanshu/10278-patch/hbase-0.99.0-SNAPSHOT/bin/hbase
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -iterations
1000000 -threads 10':
2184799.924959 task-clock # 2.465 CPUs utilized
67,056,548 context-switches # 0.031 M/sec
5,879,054 CPU-migrations # 0.003 M/sec
71,844 page-faults # 0.033 K/sec
3,293,173,733,811 cycles # 1.507 GHz
[83.33%]
2,402,602,947,823 stalled-cycles-frontend # 72.96% frontend cycles idle
[83.33%]
1,476,790,256,434 stalled-cycles-backend # 44.84% backend cycles idle
[66.70%]
1,878,777,337,255 instructions # 0.57 insns per cycle
# 1.28 stalled cycles per insn
[83.38%]
331,265,703,652 branches # 151.623 M/sec
[83.30%]
10,449,872,625 branch-misses # 3.15% of all branches
[83.34%]
886.148976683 seconds time elapsed
{code}
There are more context switches going on here.
I am working on how to remove this one level of indirection so we have lesser
number of threads, but still have SR interruptible so as to unblock them from
ongoing problematic sync call.
May be merging the syncPool with SRs (worker in pool are actual SRs).
> Provide better write predictability
> -----------------------------------
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
> Issue Type: New Feature
> Reporter: Himanshu Vashishtha
> Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf,
> SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server.
> Whenever there is any latency in the write pipeline (due to whatever reasons
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall
> write latency suffers.
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We
> also looked at HBASE-5699, which talks about adding concurrent multi WALs.
> Along with performance numbers, we also focussed on design simplicity,
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98.
> Considering all these parameters, we propose a new HLog implementation with
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL
> Switching feature, and experiments/results of a prototype implementation,
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent
> multiple WALs feature.
> Please review the doc.
--
This message was sent by Atlassian JIRA
(v6.2#6252)