Repository: arrow-site Updated Branches: refs/heads/asf-site faaac348c -> 97cc0e5af
Publish jemalloc post Project: http://git-wip-us.apache.org/repos/asf/arrow-site/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow-site/commit/97cc0e5a Tree: http://git-wip-us.apache.org/repos/asf/arrow-site/tree/97cc0e5a Diff: http://git-wip-us.apache.org/repos/asf/arrow-site/diff/97cc0e5a Branch: refs/heads/asf-site Commit: 97cc0e5af7113a0fa726e9f4851b6549380b224b Parents: faaac34 Author: Korn, Uwe <[email protected]> Authored: Sat Jul 21 18:09:04 2018 +0200 Committer: Korn, Uwe <[email protected]> Committed: Sat Jul 21 18:09:04 2018 +0200 ---------------------------------------------------------------------- blog/2018/07/20/jemalloc/index.html | 266 +++++++++++++++++++++++++++++++ blog/index.html | 145 +++++++++++++++++ docs/ipc.html | 5 +- docs/memory_layout.html | 12 +- docs/metadata.html | 4 +- feed.xml | 200 +++++++++++++---------- powered_by/index.html | 4 +- 7 files changed, 539 insertions(+), 97 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/blog/2018/07/20/jemalloc/index.html ---------------------------------------------------------------------- diff --git a/blog/2018/07/20/jemalloc/index.html b/blog/2018/07/20/jemalloc/index.html new file mode 100644 index 0000000..ca888a4 --- /dev/null +++ b/blog/2018/07/20/jemalloc/index.html @@ -0,0 +1,266 @@ +<!DOCTYPE html> +<html lang="en-US"> + <head> + <meta charset="UTF-8"> + <title>Apache Arrow Homepage</title> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="generator" content="Jekyll v3.4.3"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + + <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> + + <link href="/css/main.css" rel="stylesheet"> + <link href="/css/syntax.css" rel="stylesheet"> + <script src="https://code.jquery.com/jquery-3.2.1.min.js" + integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" + crossorigin="anonymous"></script> + <script src="/assets/javascripts/bootstrap.min.js"></script> + + <!-- Global Site Tag (gtag.js) - Google Analytics --> +<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script> +<script> + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments)}; + gtag('js', new Date()); + + gtag('config', 'UA-107500873-1'); +</script> + + + </head> + + + +<body class="wrap"> + <div class="container"> + <nav class="navbar navbar-default"> + <div class="container-fluid"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/">Apache Arrow™ </a> + </div> + + <!-- Collect the nav links, forms, and other content for toggling --> + <div class="collapse navbar-collapse" id="arrow-navbar"> + <ul class="nav navbar-nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Project Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/install/">Install</a></li> + <li><a href="/blog/">Blog</a></li> + <li><a href="/release/">Releases</a></li> + <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li> + <li><a href="https://github.com/apache/arrow">Source Code</a></li> + <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> + <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> + <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Specification<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/memory_layout.html">Memory Layout</a></li> + <li><a href="/docs/metadata.html">Metadata</a></li> + <li><a href="/docs/ipc.html">Messaging / IPC</a></li> + </ul> + </li> + + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Documentation<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/python">Python</a></li> + <li><a href="/docs/cpp">C++ API</a></li> + <li><a href="/docs/java">Java API</a></li> + <li><a href="/docs/c_glib">C GLib API</a></li> + <li><a href="/docs/js">Javascript API</a></li> + </ul> + </li> + <!-- <li><a href="/blog">Blog</a></li> --> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">ASF Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="http://www.apache.org/">ASF Website</a></li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </li> + </ul> + <a href="http://www.apache.org/"> + <img style="float:right;" src="/img/asf_logo.svg" width="120px"/> + </a> + </div><!-- /.navbar-collapse --> + </div> + </nav> + + + <h2> + Faster, scalable memory allocations in Apache Arrow with jemalloc + <a href="/blog/2018/07/20/jemalloc/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 20 Jul 2018 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://github.com/xhochy"><i class="fa fa-user"></i> Uwe Korn (uwe)</a> + </div> + </div> + </div> + + <!-- + +--> + +<p>With the release of the 0.9 version of Apache Arrow, we have switched our +default allocator for array buffers from the system allocator to jemalloc on +OSX and Linux. This applies to the C++/GLib/Python implementations of Arrow. +In most cases changing the default allocator is normally done to avoid problems +that occur with many small, frequent (de)allocations. In contrast, in Arrow we +normally deal with large in-memory datasets. While jemalloc provides good +strategies for <a href="https://zapier.com/engineering/celery-python-jemalloc/">avoiding RAM fragmentation for allocations that are lower than +a memory page (4kb)</a>, it also provides functionality that improves +performance on allocations that span several memory pages.</p> + +<p>Outside of Apache Arrow, <a href="https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/">jemalloc powers the infrastructure of Facebook</a> +(this is also where most of its development happens). It is also used as the +<a href="https://github.com/rust-lang/rust/pull/6895">default allocator in Rust</a> as well as it helps <a href="http://download.redis.io/redis-stable/README.md">Redis reduce the memory +fragmentation on Linux</a> (âAllocatorâ).</p> + +<p>One allocation specialty that we require in Arrow is that memory should be +64byte aligned. This is so that we can get the most performance out of SIMD +instruction sets like AVX. While the most modern SIMD instructions also work on +unaligned memory, their performance is much better on aligned memory. To get the +best performance for our analytical applications, we want all memory to be +allocated such that SIMD performance is maximized.</p> + +<p>For aligned allocations, the POSIX APIs only provide the +<code class="highlighter-rouge">aligned_alloc(void** ptr, size_t alignment, size_t size)</code> function to +allocate aligned memory. There is also +<code class="highlighter-rouge">posix_memalign(void **ptr, size_t alignment, size_t size)</code> to modify an +allocation to the preferred alignment. But neither of them cater for expansions +of the allocation. While the <code class="highlighter-rouge">realloc</code> function can often expand allocations +without moving them physically, it does not ensure that in the case the +allocation is moved that the alignment is kept.</p> + +<p>In the case when Arrow was built without jemalloc being enabled, this resulted +in copying the data on each new expansion of an allocation. To reduce the number +of memory copies, we use jemallocâs <code class="highlighter-rouge">*allocx()</code>-APIs to create, modify and free +aligned allocations. One of the typical tasks where this gives us a major +speedup is on the incremental construction of an Arrow table that consists of +several columns. We often donât know the size of the table in advance and need +to expand our allocations as the data is loaded.</p> + +<p>To incrementally build a vector using memory expansion of a factor of 2, we +would use the following C-code with the standard POSIX APIs:</p> + +<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">;</span> +<span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">size</span><span class="p">);</span> +<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> + <span class="kt">size_t</span> <span class="n">new_size</span> <span class="o">=</span> <span class="n">size</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span> + <span class="kt">void</span><span class="o">*</span> <span class="n">ptr2</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">new_size</span><span class="p">);</span> + <span class="n">memcpy</span><span class="p">(</span><span class="n">ptr2</span><span class="p">,</span> <span class="n">ptr</span><span class="p">,</span> <span class="n">size</span><span class="p">);</span> + <span class="n">free</span><span class="p">(</span><span class="n">ptr</span><span class="p">);</span> + <span class="n">ptr</span> <span class="o">=</span> <span class="n">ptr2</span><span class="p">;</span> + <span class="n">size</span> <span class="o">=</span> <span class="n">new_size</span><span class="p">;</span> +<span class="p">}</span> +<span class="n">free</span><span class="p">(</span><span class="n">ptr</span><span class="p">);</span> +</code></pre> +</div> + +<p>With jemallocâs special APIs, we are able to omit the explicit call to <code class="highlighter-rouge">memcpy</code>. +In the case where a memory expansion cannot be done in-place, it is still called +by the allocator but not needed on all occasions. This simplifies our user code +to:</p> + +<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">;</span> +<span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span> <span class="o">=</span> <span class="n">mallocx</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> + <span class="n">size</span> <span class="o">*=</span> <span class="mi">2</span><span class="p">;</span> + <span class="n">ptr</span> <span class="o">=</span> <span class="n">rallocx</span><span class="p">(</span><span class="n">ptr</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +<span class="p">}</span> +<span class="n">dallocx</span><span class="p">(</span><span class="n">ptr</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +</code></pre> +</div> + +<p>To see the real world benefits of using jemalloc, we look at the benchmarks in +Arrow C++. There we have modeled a typical use case of incrementally building up +an array of primitive values. For the build-up of the array, we donât know the +number of elements in the final array so we need to continuously expand the +memory region in which the data is stored. The code for this benchmark is part +of the <code class="highlighter-rouge">builder-benchmark</code> in the Arrow C++ sources as +<code class="highlighter-rouge">BuildPrimitiveArrayNoNulls</code>.</p> + +<p>Runtimes without <code class="highlighter-rouge">jemalloc</code>:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3 636726 us 804.114MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 621345 us 824.019MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 625008 us 819.19MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_mean 627693 us 815.774MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_median 625008 us 819.19MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev 8034 us 10.3829MB/s +</code></pre> +</div> + +<p>Runtimes with <code class="highlighter-rouge">jemalloc</code>:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3 630881 us 811.563MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 352891 us 1.41687GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 351039 us 1.42434GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_mean 444937 us 1.21125GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_median 352891 us 1.41687GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev 161035 us 371.335MB/s +</code></pre> +</div> + +<p>The benchmark was run three times for each configuration to see the performance +differences. The first run in each configuration yielded the same performance but +in all subsequent runs, the version using jemalloc was about twice as fast. In +these cases, the memory region that was used for constructing the array could be +expanded in place without moving the data around. This was possible as there +were memory pages assigned to the process that were unused but not reclaimed by +the operating system. Without <code class="highlighter-rouge">jemalloc</code>, we cannot make use of them simply by +the fact that the default allocator has no API that provides aligned +reallocation.</p> + + + + <hr/> +<footer class="footer"> + <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> + <p>© 2017 Apache Software Foundation</p> +</footer> + + </div> +</body> +</html> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/blog/index.html ---------------------------------------------------------------------- diff --git a/blog/index.html b/blog/index.html index ddad423..2ef6dd9 100644 --- a/blog/index.html +++ b/blog/index.html @@ -124,6 +124,151 @@ <div class="container"> <h2> + Faster, scalable memory allocations in Apache Arrow with jemalloc + <a href="/blog/2018/07/20/jemalloc/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 20 Jul 2018 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://github.com/xhochy"><i class="fa fa-user"></i> Uwe Korn (uwe)</a> + </div> + </div> + </div> + <!-- + +--> + +<p>With the release of the 0.9 version of Apache Arrow, we have switched our +default allocator for array buffers from the system allocator to jemalloc on +OSX and Linux. This applies to the C++/GLib/Python implementations of Arrow. +In most cases changing the default allocator is normally done to avoid problems +that occur with many small, frequent (de)allocations. In contrast, in Arrow we +normally deal with large in-memory datasets. While jemalloc provides good +strategies for <a href="https://zapier.com/engineering/celery-python-jemalloc/">avoiding RAM fragmentation for allocations that are lower than +a memory page (4kb)</a>, it also provides functionality that improves +performance on allocations that span several memory pages.</p> + +<p>Outside of Apache Arrow, <a href="https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/">jemalloc powers the infrastructure of Facebook</a> +(this is also where most of its development happens). It is also used as the +<a href="https://github.com/rust-lang/rust/pull/6895">default allocator in Rust</a> as well as it helps <a href="http://download.redis.io/redis-stable/README.md">Redis reduce the memory +fragmentation on Linux</a> (âAllocatorâ).</p> + +<p>One allocation specialty that we require in Arrow is that memory should be +64byte aligned. This is so that we can get the most performance out of SIMD +instruction sets like AVX. While the most modern SIMD instructions also work on +unaligned memory, their performance is much better on aligned memory. To get the +best performance for our analytical applications, we want all memory to be +allocated such that SIMD performance is maximized.</p> + +<p>For aligned allocations, the POSIX APIs only provide the +<code class="highlighter-rouge">aligned_alloc(void** ptr, size_t alignment, size_t size)</code> function to +allocate aligned memory. There is also +<code class="highlighter-rouge">posix_memalign(void **ptr, size_t alignment, size_t size)</code> to modify an +allocation to the preferred alignment. But neither of them cater for expansions +of the allocation. While the <code class="highlighter-rouge">realloc</code> function can often expand allocations +without moving them physically, it does not ensure that in the case the +allocation is moved that the alignment is kept.</p> + +<p>In the case when Arrow was built without jemalloc being enabled, this resulted +in copying the data on each new expansion of an allocation. To reduce the number +of memory copies, we use jemallocâs <code class="highlighter-rouge">*allocx()</code>-APIs to create, modify and free +aligned allocations. One of the typical tasks where this gives us a major +speedup is on the incremental construction of an Arrow table that consists of +several columns. We often donât know the size of the table in advance and need +to expand our allocations as the data is loaded.</p> + +<p>To incrementally build a vector using memory expansion of a factor of 2, we +would use the following C-code with the standard POSIX APIs:</p> + +<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">;</span> +<span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">size</span><span class="p">);</span> +<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> + <span class="kt">size_t</span> <span class="n">new_size</span> <span class="o">=</span> <span class="n">size</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span> + <span class="kt">void</span><span class="o">*</span> <span class="n">ptr2</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">new_size</span><span class="p">);</span> + <span class="n">memcpy</span><span class="p">(</span><span class="n">ptr2</span><span class="p">,</span> <span class="n">ptr</span><span class="p">,</span> <span class="n">size</span><span class="p">);</span> + <span class="n">free</span><span class="p">(</span><span class="n">ptr</span><span class="p">);</span> + <span class="n">ptr</span> <span class="o">=</span> <span class="n">ptr2</span><span class="p">;</span> + <span class="n">size</span> <span class="o">=</span> <span class="n">new_size</span><span class="p">;</span> +<span class="p">}</span> +<span class="n">free</span><span class="p">(</span><span class="n">ptr</span><span class="p">);</span> +</code></pre> +</div> + +<p>With jemallocâs special APIs, we are able to omit the explicit call to <code class="highlighter-rouge">memcpy</code>. +In the case where a memory expansion cannot be done in-place, it is still called +by the allocator but not needed on all occasions. This simplifies our user code +to:</p> + +<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">;</span> +<span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span> <span class="o">=</span> <span class="n">mallocx</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> + <span class="n">size</span> <span class="o">*=</span> <span class="mi">2</span><span class="p">;</span> + <span class="n">ptr</span> <span class="o">=</span> <span class="n">rallocx</span><span class="p">(</span><span class="n">ptr</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +<span class="p">}</span> +<span class="n">dallocx</span><span class="p">(</span><span class="n">ptr</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +</code></pre> +</div> + +<p>To see the real world benefits of using jemalloc, we look at the benchmarks in +Arrow C++. There we have modeled a typical use case of incrementally building up +an array of primitive values. For the build-up of the array, we donât know the +number of elements in the final array so we need to continuously expand the +memory region in which the data is stored. The code for this benchmark is part +of the <code class="highlighter-rouge">builder-benchmark</code> in the Arrow C++ sources as +<code class="highlighter-rouge">BuildPrimitiveArrayNoNulls</code>.</p> + +<p>Runtimes without <code class="highlighter-rouge">jemalloc</code>:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3 636726 us 804.114MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 621345 us 824.019MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 625008 us 819.19MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_mean 627693 us 815.774MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_median 625008 us 819.19MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev 8034 us 10.3829MB/s +</code></pre> +</div> + +<p>Runtimes with <code class="highlighter-rouge">jemalloc</code>:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3 630881 us 811.563MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 352891 us 1.41687GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 351039 us 1.42434GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_mean 444937 us 1.21125GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_median 352891 us 1.41687GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev 161035 us 371.335MB/s +</code></pre> +</div> + +<p>The benchmark was run three times for each configuration to see the performance +differences. The first run in each configuration yielded the same performance but +in all subsequent runs, the version using jemalloc was about twice as fast. In +these cases, the memory region that was used for constructing the array could be +expanded in place without moving the data around. This was possible as there +were memory pages assigned to the process that were unused but not reclaimed by +the operating system. Without <code class="highlighter-rouge">jemalloc</code>, we cannot make use of them simply by +the fact that the default allocator has no API that provides aligned +reallocation.</p> + + + </div> + + + + + + <div class="container"> + <h2> A Native Go Library for Apache Arrow <a href="/blog/2018/03/22/go-code-donation/" class="permalink" title="Permalink">â</a> </h2> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/docs/ipc.html ---------------------------------------------------------------------- diff --git a/docs/ipc.html b/docs/ipc.html index 5022c80..00f7817 100644 --- a/docs/ipc.html +++ b/docs/ipc.html @@ -366,8 +366,9 @@ tools. Arrow implementations in general are not required to implement this data format, though we provide a reference implementation in C++.</p> <p>When writing a standalone encapsulated tensor message, we use the format as -indicated above, but additionally align the starting offset (if writing to a -shared memory region) to be a multiple of 8:</p> +indicated above, but additionally align the starting offset of the metadata as +well as the starting offset of the tensor body (if writing to a shared memory +region) to be multiples of 64 bytes:</p> <div class="highlighter-rouge"><pre class="highlight"><code><PADDING> <metadata size: int32> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/docs/memory_layout.html ---------------------------------------------------------------------- diff --git a/docs/memory_layout.html b/docs/memory_layout.html index ff8f9e8..62a383b 100644 --- a/docs/memory_layout.html +++ b/docs/memory_layout.html @@ -141,7 +141,7 @@ <h2 id="definitions--terminology">Definitions / Terminology</h2> -<p>Since different projects have used differents words to describe various +<p>Since different projects have used different words to describe various concepts, here is a small glossary to help disambiguate.</p> <ul> @@ -406,7 +406,7 @@ values array to 2<sup>31</sup>-1.</li> <p>The offsets array encodes a start position in the values array, and the length of the value in each slot is computed using the first difference with the next -element in the offsets array. For example. the position and length of slot j is +element in the offsets array. For example, the position and length of slot j is computed as:</p> <div class="highlighter-rouge"><pre class="highlight"><code>slot_position = offsets[j] @@ -444,7 +444,7 @@ logical type.</p> * Length: 7, Null count: 0 * Null bitmap buffer: Not required - | Bytes 0-7 | Bytes 8-63 | + | Bytes 0-6 | Bytes 7-63 | |------------|-------------| | joemark | unspecified | </code></pre> @@ -474,7 +474,7 @@ logical type.</p> * Offsets buffer (int32) - | Bytes 0-28 | Bytes 29-63 | + | Bytes 0-27 | Bytes 28-63 | |----------------------|-------------| | 0, 2, 4, 7, 7, 8, 10 | unspecified | @@ -499,7 +499,7 @@ type metadata, not the physical memory layout.</p> <p>A struct array does not have any additional allocated physical storage for its values. A struct array must still have an allocated null bitmap, if it has one or more null values.</p> -<p>Physically, a struct type has one child array for each field.</p> +<p>Physically, a struct type has one child array for each field. The child arrays are independent and need not be adjacent to each other in memory.</p> <p>For example, the struct (field names shown here as strings for illustration purposes)</p> @@ -747,7 +747,7 @@ reinterpreted as a non-nested array.</p> <p>Similar to structs, a particular child array may have a non-null slot even if the null bitmap of the parent union array indicates the slot is null. Additionally, a child array may have a non-null slot even if -the the types array indicates that a slot contains a different type at the index.</p> +the types array indicates that a slot contains a different type at the index.</p> <h2 id="dictionary-encoding">Dictionary encoding</h2> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/docs/metadata.html ---------------------------------------------------------------------- diff --git a/docs/metadata.html b/docs/metadata.html index 858f0c0..78f288e 100644 --- a/docs/metadata.html +++ b/docs/metadata.html @@ -194,7 +194,7 @@ the columns. The Flatbuffers IDL for a field is:</p> <p>The <code class="highlighter-rouge">type</code> is the logical type of the field. Nested types, such as List, Struct, and Union, have a sequence of child fields.</p> -<p>a JSON representation of the schema is also provided: +<p>A JSON representation of the schema is also provided: Field:</p> <div class="highlighter-rouge"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nt">"name"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"name_of_the_field"</span><span class="p">,</span><span class="w"> @@ -510,7 +510,7 @@ according to the child logical type (e.g. <code class="highlighter-rouge">List&l <p>We specify two logical types for variable length bytes:</p> <ul> - <li><code class="highlighter-rouge">Utf8</code> data is unicode values with UTF-8 encoding</li> + <li><code class="highlighter-rouge">Utf8</code> data is Unicode values with UTF-8 encoding</li> <li><code class="highlighter-rouge">Binary</code> is any other variable length bytes</li> </ul> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/feed.xml ---------------------------------------------------------------------- diff --git a/feed.xml b/feed.xml index 98d48d5..8ae2618 100644 --- a/feed.xml +++ b/feed.xml @@ -1,4 +1,117 @@ -<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2018-04-09T04:33:24-04:00</updated><id>/</id><entry><title type="html">A Native Go Library for Apache Arrow</title><link href="/blog/2018/03/22/go-code-donation/" rel="alternate" type="text/html" title="A Native Go Library for Apache Arrow" /><published>2018-03-22T00:00:00-04:00</published><updated>2018-03-22T00:00:00-04:00</updated><id>/blog/2018/03/22/go-code-donation</id><content type="html" xml:base="/blog/2018/03/22/go-code-donation/"><!-- +<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2018-07-21T12:07:24-04:00</updated><id>/</id><entry><title type="html">Faster, scalable memory allocations in Apache Arrow with jemalloc</title><link href="/blog/2018/07/20/jemalloc/" rel="alternate" type="text/html" title="Faster, scalable memory allocations in Apache Arrow with jemalloc" /><published>2018-07-20T07:00:00-04:00</published><updated>2018-07-20T07:00:00-04:00</updated><id>/blog/2018/07/20/jemalloc</id><content type="html" xml:base="/blog/2018/07/20/jemalloc/"><!-- + +--> + +<p>With the release of the 0.9 version of Apache Arrow, we have switched our +default allocator for array buffers from the system allocator to jemalloc on +OSX and Linux. This applies to the C++/GLib/Python implementations of Arrow. +In most cases changing the default allocator is normally done to avoid problems +that occur with many small, frequent (de)allocations. In contrast, in Arrow we +normally deal with large in-memory datasets. While jemalloc provides good +strategies for <a href="https://zapier.com/engineering/celery-python-jemalloc/">avoiding RAM fragmentation for allocations that are lower than +a memory page (4kb)</a>, it also provides functionality that improves +performance on allocations that span several memory pages.</p> + +<p>Outside of Apache Arrow, <a href="https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/">jemalloc powers the infrastructure of Facebook</a> +(this is also where most of its development happens). It is also used as the +<a href="https://github.com/rust-lang/rust/pull/6895">default allocator in Rust</a> as well as it helps <a href="http://download.redis.io/redis-stable/README.md">Redis reduce the memory +fragmentation on Linux</a> (âAllocatorâ).</p> + +<p>One allocation specialty that we require in Arrow is that memory should be +64byte aligned. This is so that we can get the most performance out of SIMD +instruction sets like AVX. While the most modern SIMD instructions also work on +unaligned memory, their performance is much better on aligned memory. To get the +best performance for our analytical applications, we want all memory to be +allocated such that SIMD performance is maximized.</p> + +<p>For aligned allocations, the POSIX APIs only provide the +<code class="highlighter-rouge">aligned_alloc(void** ptr, size_t alignment, size_t size)</code> function to +allocate aligned memory. There is also +<code class="highlighter-rouge">posix_memalign(void **ptr, size_t alignment, size_t size)</code> to modify an +allocation to the preferred alignment. But neither of them cater for expansions +of the allocation. While the <code class="highlighter-rouge">realloc</code> function can often expand allocations +without moving them physically, it does not ensure that in the case the +allocation is moved that the alignment is kept.</p> + +<p>In the case when Arrow was built without jemalloc being enabled, this resulted +in copying the data on each new expansion of an allocation. To reduce the number +of memory copies, we use jemallocâs <code class="highlighter-rouge">*allocx()</code>-APIs to create, modify and free +aligned allocations. One of the typical tasks where this gives us a major +speedup is on the incremental construction of an Arrow table that consists of +several columns. We often donât know the size of the table in advance and need +to expand our allocations as the data is loaded.</p> + +<p>To incrementally build a vector using memory expansion of a factor of 2, we +would use the following C-code with the standard POSIX APIs:</p> + +<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">;</span> +<span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">size</span><span class="p">);</span> +<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> + <span class="kt">size_t</span> <span class="n">new_size</span> <span class="o">=</span> <span class="n">size</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span> + <span class="kt">void</span><span class="o">*</span> <span class="n">ptr2</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">new_size</span><span class="p">);</span> + <span class="n">memcpy</span><span class="p">(</span><span class="n">ptr2</span><span class="p">,</span> <span class="n">ptr</span><span class="p">,</span> <span class="n">size</span><span class="p">);</span> + <span class="n">free</span><span class="p">(</span><span class="n">ptr</span><span class="p">);</span> + <span class="n">ptr</span> <span class="o">=</span> <span class="n">ptr2</span><span class="p">;</span> + <span class="n">size</span> <span class="o">=</span> <span class="n">new_size</span><span class="p">;</span> +<span class="p">}</span> +<span class="n">free</span><span class="p">(</span><span class="n">ptr</span><span class="p">);</span> +</code></pre> +</div> + +<p>With jemallocâs special APIs, we are able to omit the explicit call to <code class="highlighter-rouge">memcpy</code>. +In the case where a memory expansion cannot be done in-place, it is still called +by the allocator but not needed on all occasions. This simplifies our user code +to:</p> + +<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">;</span> +<span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span> <span class="o">=</span> <span class="n">mallocx</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> + <span class="n">size</span> <span class="o">*=</span> <span class="mi">2</span><span class="p">;</span> + <span class="n">ptr</span> <span class="o">=</span> <span class="n">rallocx</span><span class="p">(</span><span class="n">ptr</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +<span class="p">}</span> +<span class="n">dallocx</span><span class="p">(</span><span class="n">ptr</span><span class="p">,</span> <span class="n">MALLOCX_ALIGN</span><span class="p">(</span><span class="mi">64</span><span class="p">));</span> +</code></pre> +</div> + +<p>To see the real world benefits of using jemalloc, we look at the benchmarks in +Arrow C++. There we have modeled a typical use case of incrementally building up +an array of primitive values. For the build-up of the array, we donât know the +number of elements in the final array so we need to continuously expand the +memory region in which the data is stored. The code for this benchmark is part +of the <code class="highlighter-rouge">builder-benchmark</code> in the Arrow C++ sources as +<code class="highlighter-rouge">BuildPrimitiveArrayNoNulls</code>.</p> + +<p>Runtimes without <code class="highlighter-rouge">jemalloc</code>:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3 636726 us 804.114MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 621345 us 824.019MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 625008 us 819.19MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_mean 627693 us 815.774MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_median 625008 us 819.19MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev 8034 us 10.3829MB/s +</code></pre> +</div> + +<p>Runtimes with <code class="highlighter-rouge">jemalloc</code>:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3 630881 us 811.563MB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 352891 us 1.41687GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3 351039 us 1.42434GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_mean 444937 us 1.21125GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_median 352891 us 1.41687GB/s +BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev 161035 us 371.335MB/s +</code></pre> +</div> + +<p>The benchmark was run three times for each configuration to see the performance +differences. The first run in each configuration yielded the same performance but +in all subsequent runs, the version using jemalloc was about twice as fast. In +these cases, the memory region that was used for constructing the array could be +expanded in place without moving the data around. This was possible as there +were memory pages assigned to the process that were unused but not reclaimed by +the operating system. Without <code class="highlighter-rouge">jemalloc</code>, we cannot make use of them simply by +the fact that the default allocator has no API that provides aligned +reallocation.</p></content><author><name>uwe</name></author></entry><entry><title type="html">A Native Go Library for Apache Arrow</title><link href="/blog/2018/03/22/go-code-donation/" rel="alternate" type="text/html" title="A Native Go Library for Apache Arrow" /><published>2018-03-22T00:00:00-04:00</published><updated>2018-03-22T00:00:00-04:00</updated><id>/blog/2018/03/22/go-code-donation</id><content type="html" xml:base="/blog/2018/03/22/go-code-donation/"><!-- --> @@ -1105,87 +1218,4 @@ DataFrame (<a href="https://issues.apache.org/jira/browse/SPARK-20791&qu <p>Reaching this first milestone was a group effort from both the Apache Arrow and Spark communities. Thanks to the hard work of <a href="https://github.com/wesm">Wes McKinney</a>, <a href="https://github.com/icexelloss">Li Jin</a>, <a href="https://github.com/holdenk">Holden Karau</a>, Reynold Xin, Wenchen Fan, Shane Knapp and many others that -helped push this effort forwards.</p></content><author><name>BryanCutler</name></author></entry><entry><title type="html">Apache Arrow 0.5.0 Release</title><link href="/blog/2017/07/25/0.5.0-release/" rel="alternate" type="text/html" title="Apache Arrow 0.5.0 Release" /><published>2017-07-25T00:00:00-04:00</published><updated>2017-07-25T00:00:00-04:00</updated><id>/blog/2017/07/25/0.5.0-release</id><content type="html" xml:base="/blog/2017/07/25/0.5.0-release/"><!-- - ---> - -<p>The Apache Arrow team is pleased to announce the 0.5.0 release. It includes -<a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.5.0"><strong>130 resolved JIRAs</strong></a> with some new features, expanded integration -testing between implementations, and bug fixes. The Arrow memory format remains -stable since the 0.3.x and 0.4.x releases.</p> - -<p>See the <a href="http://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your -platform. The <a href="http://arrow.apache.org/release/0.5.0.html">complete changelog</a> is also available.</p> - -<h2 id="expanded-integration-testing">Expanded Integration Testing</h2> - -<p>In this release, we added compatibility tests for dictionary-encoded data -between Java and C++. This enables the distinct values (the <em>dictionary</em>) in a -vector to be transmitted as part of an Arrow schema while the record batches -contain integers which correspond to the dictionary.</p> - -<p>So we might have:</p> - -<div class="highlighter-rouge"><pre class="highlight"><code>data (string): ['foo', 'bar', 'foo', 'bar'] -</code></pre> -</div> - -<p>In dictionary-encoded form, this could be represented as:</p> - -<div class="highlighter-rouge"><pre class="highlight"><code>indices (int8): [0, 1, 0, 1] -dictionary (string): ['foo', 'bar'] -</code></pre> -</div> - -<p>In upcoming releases, we plan to complete integration testing for the remaining -data types (including some more complicated types like unions and decimals) on -the road to a 1.0.0 release in the future.</p> - -<h2 id="c-activity">C++ Activity</h2> - -<p>We completed a number of significant pieces of work in the C++ part of Apache -Arrow.</p> - -<h3 id="using-jemalloc-as-default-memory-allocator">Using jemalloc as default memory allocator</h3> - -<p>We decided to use <a href="https://github.com/jemalloc/jemalloc">jemalloc</a> as the default memory allocator unless it is -explicitly disabled. This memory allocator has significant performance -advantages in Arrow workloads over the default <code class="highlighter-rouge">malloc</code> implementation. We will -publish a blog post going into more detail about this and why you might care.</p> - -<h3 id="sharing-more-c-code-with-apache-parquet">Sharing more C++ code with Apache Parquet</h3> - -<p>We imported the compression library interfaces and dictionary encoding -algorithms from the <a href="http://github.com/apache/parquet-cpp">Apache Parquet C++ library</a>. The Parquet library now -depends on this code in Arrow, and we will be able to use it more easily for -data compression in Arrow use cases.</p> - -<p>As part of incorporating Parquetâs dictionary encoding utilities, we have -developed an <code class="highlighter-rouge">arrow::DictionaryBuilder</code> class to enable building -dictionary-encoded arrays iteratively. This can help save memory and yield -better performance when interacting with databases, Parquet files, or other -sources which may have columns having many duplicates.</p> - -<h3 id="support-for-lz4-and-zstd-compressors">Support for LZ4 and ZSTD compressors</h3> - -<p>We added LZ4 and ZSTD compression library support. In ARROW-300 and other -planned work, we intend to add some compression features for data sent via RPC.</p> - -<h2 id="python-activity">Python Activity</h2> - -<p>We fixed many bugs which were affecting Parquet and Feather users and fixed -several other rough edges with normal Arrow use. We also added some additional -Arrow type conversions: structs, lists embedded in pandas objects, and Arrow -time types (which deserialize to the <code class="highlighter-rouge">datetime.time</code> type).</p> - -<p>In upcoming releases we plan to continue to improve <a href="http://github.com/dask/dask">Dask</a> support and -performance for distributed processing of Apache Parquet files with pyarrow.</p> - -<h2 id="the-road-ahead">The Road Ahead</h2> - -<p>We have much work ahead of us to build out Arrow integrations in other data -systems to improve their processing performance and interoperability with other -systems.</p> - -<p>We are discussing the roadmap to a future 1.0.0 release on the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">developer -mailing list</a>. Please join the discussion there.</p></content><author><name>wesm</name></author></entry></feed> \ No newline at end of file +helped push this effort forwards.</p></content><author><name>BryanCutler</name></author></entry></feed> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/powered_by/index.html ---------------------------------------------------------------------- diff --git a/powered_by/index.html b/powered_by/index.html index 2b2b111..8f67798 100644 --- a/powered_by/index.html +++ b/powered_by/index.html @@ -212,8 +212,8 @@ existing Ruby libraries with Apache Arrow. They use Red Arrow.</li> database management system that helps researchers integrate and analyze diverse, multi-dimensional, high resolution data - like genomic, clinical, images, sensor, environmental, and IoT data - -all in one analytical platform. <a href="https://github.com/Paradigm4/stream">SciDB streaming</a> is -powered by Apache Arrow.</li> +all in one analytical platform. <a href="https://github.com/Paradigm4/stream">SciDB streaming</a> and +<a href="https://github.com/Paradigm4/accelerated_io_tools">accelerated_io_tools</a> are powered by Apache Arrow.</li> <li><strong><a href="https://github.com/blue-yonder/turbodbc">Turbodbc</a>:</strong> Python module to access relational databases via the Open Database Connectivity (ODBC) interface. It provides the ability to return Arrow Tables and RecordBatches in addition to the Python Database API
