[arrow] branch master updated: ARROW-2855: [C++] Blog post that outlines the benefits of using jemalloc

uwe Sat, 21 Jul 2018 09:05:29 -0700

This is an automated email from the ASF dual-hosted git repository.

uwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git



The following commit(s) were added to refs/heads/master by this push:
     new 87c976a  ARROW-2855: [C++] Blog post that outlines the benefits of 
using jemalloc
87c976a is described below

commit 87c976adcd438d43719912d48e200032729959d3
Author: Korn, Uwe <[email protected]>
AuthorDate: Sat Jul 21 18:04:43 2018 +0200

    ARROW-2855: [C++] Blog post that outlines the benefits of using jemalloc
    
    I will update the date once this has passed review.
    
    Author: Korn, Uwe <[email protected]>
    
    Closes #2271 from xhochy/ARROW-2855 and squashes the following commits:
    
    dbb7bc4a <Korn, Uwe> Prepare post for publish
    7410ada2 <Korn, Uwe> Review comments and typos
    298f4eb8 <Korn, Uwe> ARROW-2855:  Blog post that outlines the benefits of 
using jemalloc
---
 site/_posts/2018-07-20-jemalloc.md | 141 +++++++++++++++++++++++++++++++++++++
 1 file changed, 141 insertions(+)

diff --git a/site/_posts/2018-07-20-jemalloc.md 
b/site/_posts/2018-07-20-jemalloc.md
new file mode 100644
index 0000000..ea47665
--- /dev/null
+++ b/site/_posts/2018-07-20-jemalloc.md
@@ -0,0 +1,141 @@
+---
+layout: post
+title: "Faster, scalable memory allocations in Apache Arrow with jemalloc"
+date: "2018-07-20 12:00:00 +0100"
+author: uwe
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+With the release of the 0.9 version of Apache Arrow, we have switched our
+default allocator for array buffers from the system allocator to jemalloc on
+OSX and Linux. This applies to the C++/GLib/Python implementations of Arrow.
+In most cases changing the default allocator is normally done to avoid problems
+that occur with many small, frequent (de)allocations. In contrast, in Arrow we
+normally deal with large in-memory datasets. While jemalloc provides good
+strategies for [avoiding RAM fragmentation for allocations that are lower than
+a memory page (4kb)][1], it also provides functionality that improves
+performance on allocations that span several memory pages.
+
+Outside of Apache Arrow, [jemalloc powers the infrastructure of Facebook][2]
+(this is also where most of its development happens). It is also used as the
+[default allocator in Rust][3] as well as it helps [Redis reduce the memory
+fragmentation on Linux][4] ("Allocator").
+
+One allocation specialty that we require in Arrow is that memory should be
+64byte aligned. This is so that we can get the most performance out of SIMD
+instruction sets like AVX. While the most modern SIMD instructions also work on
+unaligned memory, their performance is much better on aligned memory. To get 
the
+best performance for our analytical applications, we want all memory to be
+allocated such that SIMD performance is maximized.
+
+For aligned allocations, the POSIX APIs only provide the
+`aligned_alloc(void** ptr, size_t alignment, size_t size)` function to
+allocate aligned memory. There is also 
+`posix_memalign(void **ptr, size_t alignment, size_t size)` to modify an
+allocation to the preferred alignment. But neither of them cater for expansions
+of the allocation. While the `realloc` function can often expand allocations
+without moving them physically, it does not ensure that in the case the
+allocation is moved that the alignment is kept.
+
+In the case when Arrow was built without jemalloc being enabled, this resulted
+in copying the data on each new expansion of an allocation. To reduce the 
number
+of memory copies, we use jemalloc's `*allocx()`-APIs to create, modify and free
+aligned allocations. One of the typical tasks where this gives us a major
+speedup is on the incremental construction of an Arrow table that consists of
+several columns. We often don't know the size of the table in advance and need
+to expand our allocations as the data is loaded.
+
+To incrementally build a vector using memory expansion of a factor of 2, we
+would use the following C-code with the standard POSIX APIs:
+
+```c
+size_t size = 128 * 1024;
+void* ptr = aligned_alloc(64, size);
+for (int i = 0; i < 10; i++) {
+  size_t new_size = size * 2;
+  void* ptr2 = aligned_alloc(64, new_size);
+  memcpy(ptr2, ptr, size);
+  free(ptr);
+  ptr = ptr2;
+  size = new_size;
+}
+free(ptr);
+```
+
+With jemalloc's special APIs, we are able to omit the explicit call to 
`memcpy`.
+In the case where a memory expansion cannot be done in-place, it is still 
called
+by the allocator but not needed on all occasions. This simplifies our user code
+to:
+
+```c
+size_t size = 128 * 1024;
+void* ptr = mallocx(size, MALLOCX_ALIGN(64));
+for (int i = 0; i < 10; i++) {
+  size *= 2;
+  ptr = rallocx(ptr, size, MALLOCX_ALIGN(64));
+}
+dallocx(ptr, MALLOCX_ALIGN(64));
+```
+
+To see the real world benefits of using jemalloc, we look at the benchmarks in
+Arrow C++. There we have modeled a typical use case of incrementally building 
up
+an array of primitive values. For the build-up of the array, we don't know the
+number of elements in the final array so we need to continuously expand the
+memory region in which the data is stored. The code for this benchmark is part
+of the `builder-benchmark` in the Arrow C++ sources as
+`BuildPrimitiveArrayNoNulls`.
+
+Runtimes without `jemalloc`:
+
+```
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 636726 us   804.114MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 621345 us   824.019MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 625008 us    819.19MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_mean            627693 us   815.774MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_median          625008 us    819.19MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev            8034 us   10.3829MB/s
+```
+
+Runtimes with `jemalloc`:
+
+```
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 630881 us   811.563MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 352891 us   1.41687GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 351039 us   1.42434GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_mean            444937 us   1.21125GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_median          352891 us   1.41687GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev          161035 us   371.335MB/s
+```
+
+The benchmark was run three times for each configuration to see the performance
+differences. The first run in each configuration yielded the same performance 
but
+in all subsequent runs, the version using jemalloc was about twice as fast. In
+these cases, the memory region that was used for constructing the array could 
be
+expanded in place without moving the data around. This was possible as there
+were memory pages assigned to the process that were unused but not reclaimed by
+the operating system. Without `jemalloc`, we cannot make use of them simply by
+the fact that the default allocator has no API that provides aligned
+reallocation.
+
+[1]: https://zapier.com/engineering/celery-python-jemalloc/
+[2]: 
https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/
+[3]: https://github.com/rust-lang/rust/pull/6895
+[4]: http://download.redis.io/redis-stable/README.md

[arrow] branch master updated: ARROW-2855: [C++] Blog post that outlines the benefits of using jemalloc

Reply via email to