[
https://issues.apache.org/jira/browse/KUDU-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183639#comment-15183639
]
Todd Lipcon commented on KUDU-1366:
-----------------------------------
Seems the issue with wire_protocol-benchmark is a huge number of page faults.
jemalloc tries to maintain a maximum ratio of "reserved but not actually in
use" pages (sort of like what we try to do ourselves manually with tcmalloc).
In a microbenchmark which repeatedly allocates and deallocates a big chunk of
memory, it reacts after each deallocation with "aha! I can madvise away all
that memory!" whereas tcmalloc holds onto it. So, we use a lot more system time
with jemalloc.
https://www.mail-archive.com/[email protected]/msg00601.html
discusses a similar issue, which they claim to be partially fixed, but
apparently not enough.
Adding a junk malloc/memset of 1GB at the beginning of the test improves it, as
does setting MALLOC_CONF=lg_dirty_mult=-1 which tells it to not madvise back
these dirty pages. The number of page faults is still significantly more after
that fix, but didn't spend any time looking why. Will re-run benchmarks with
the above env var to see if most of the regressions disappear or get smaller
while maintaining the improvements.
> Consider switching to jemalloc
> ------------------------------
>
> Key: KUDU-1366
> URL: https://issues.apache.org/jira/browse/KUDU-1366
> Project: Kudu
> Issue Type: Bug
> Components: build
> Reporter: Todd Lipcon
> Attachments: Kudu Benchmarks.pdf
>
>
> We spend a fair amount of time in the allocator. While we could spend some
> time trying to use arenas more, it's also worth considering switching
> allocators. I ran a few quick tests with jemalloc 4.1 and it seems like it
> might be better than the version of tcmalloc that we use (and has much more
> active development)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)