[
https://issues.apache.org/jira/browse/KUDU-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155725#comment-17155725
]
ASF subversion and git services commented on KUDU-636:
------------------------------------------------------
Commit a600f386aa2c341522638acb9af53fd45c469431 in kudu's branch
refs/heads/master from Todd Lipcon
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=a600f38 ]
KUDU-636. Use Arena for EncodedKeys
This updates EncodedKeyBuilder, RowSetKeyProbe, and EncodedKey to always
allocate from an Arena instead of from the heap. This reduces allocator
contention on the write path significantly and improves memory locality.
I measured by running a tserver under 'perf stat' while using perf loadgen to
insert 80M rows total using 8 client threads. The CPU time on the tserver was
reduced by about 20%.
Before:
Performance counter stats for './build/latest/bin/kudu tserver run -fs-wal-dir
/tmp/ts':
269853.10 msec task-clock # 6.862 CPUs utilized
293066 context-switches # 0.001 M/sec
44541 cpu-migrations # 0.165 K/sec
2846435 page-faults # 0.011 M/sec
1110190206891 cycles # 4.114 GHz
(83.33%)
201895623339 stalled-cycles-frontend # 18.19% frontend cycles
idle (83.33%)
137095475307 stalled-cycles-backend # 12.35% backend cycles
idle (83.32%)
894201276095 instructions # 0.81 insn per cycle
# 0.23 stalled cycles per
insn (83.33%)
159095264762 branches # 589.562 M/sec
(83.35%)
639216492 branch-misses # 0.40% of all branches
(83.35%)
255.178068000 seconds user
14.913394000 seconds sys
After:
Performance counter stats for './build/latest/bin/kudu tserver run -fs-wal-dir
/tmp/ts':
227730.62 msec task-clock # 6.212 CPUs utilized
263824 context-switches # 0.001 M/sec
45470 cpu-migrations # 0.200 K/sec
3165436 page-faults # 0.014 M/sec
931840588715 cycles # 4.092 GHz
(83.25%)
183214671009 stalled-cycles-frontend # 19.66% frontend cycles
idle (83.40%)
111864991317 stalled-cycles-backend # 12.00% backend cycles
idle (83.35%)
832636863971 instructions # 0.89 insn per cycle
# 0.22 stalled cycles per
insn (83.40%)
148228107120 branches # 650.892 M/sec
(83.24%)
563344647 branch-misses # 0.38% of all branches
(83.35%)
211.361472000 seconds user
16.635265000 seconds sys
Change-Id: Ib46d0e2c31e03a7f319ceb0bf742e08ff74d7683
Reviewed-on: http://gerrit.cloudera.org:8080/16162
Reviewed-by: Alexey Serbin <[email protected]>
Tested-by: Todd Lipcon <[email protected]>
> optimization: we spend a lot of time in alloc/free
> --------------------------------------------------
>
> Key: KUDU-636
> URL: https://issues.apache.org/jira/browse/KUDU-636
> Project: Kudu
> Issue Type: Improvement
> Components: perf
> Affects Versions: Public beta
> Reporter: Todd Lipcon
> Priority: Major
>
> Looking at a workload in the cluster, several of the top 10 lines of perf
> report are tcmalloc-related. It seems like we don't do a good job of making
> use of the per-thread free-lists, and we end up in a lot of contention on the
> central free list. There are a few low-hanging fruit things we could do to
> improve this for a likely perf boost.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)