arrow-site git commit: Publish jemalloc post

uwe Sat, 21 Jul 2018 09:09:27 -0700

Repository: arrow-site
Updated Branches:
  refs/heads/asf-site faaac348c -> 97cc0e5af



Publish jemalloc post


Project: http://git-wip-us.apache.org/repos/asf/arrow-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow-site/commit/97cc0e5a
Tree: http://git-wip-us.apache.org/repos/asf/arrow-site/tree/97cc0e5a
Diff: http://git-wip-us.apache.org/repos/asf/arrow-site/diff/97cc0e5a

Branch: refs/heads/asf-site
Commit: 97cc0e5af7113a0fa726e9f4851b6549380b224b
Parents: faaac34
Author: Korn, Uwe <[email protected]>
Authored: Sat Jul 21 18:09:04 2018 +0200
Committer: Korn, Uwe <[email protected]>
Committed: Sat Jul 21 18:09:04 2018 +0200

----------------------------------------------------------------------
 blog/2018/07/20/jemalloc/index.html | 266 +++++++++++++++++++++++++++++++
 blog/index.html                     | 145 +++++++++++++++++
 docs/ipc.html                       |   5 +-
 docs/memory_layout.html             |  12 +-
 docs/metadata.html                  |   4 +-
 feed.xml                            | 200 +++++++++++++----------
 powered_by/index.html               |   4 +-
 7 files changed, 539 insertions(+), 97 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/blog/2018/07/20/jemalloc/index.html
----------------------------------------------------------------------
diff --git a/blog/2018/07/20/jemalloc/index.html 
b/blog/2018/07/20/jemalloc/index.html
new file mode 100644
index 0000000..ca888a4
--- /dev/null
+++ b/blog/2018/07/20/jemalloc/index.html
@@ -0,0 +1,266 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Apache Arrow Homepage</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js";
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache 
Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW";>Issue 
Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow";>Source Code</a></li>
+            <li><a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Mailing List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com";>Slack 
Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+            <li><a href="/powered_by/">Powered By</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+            <li><a href="/docs/js">Javascript API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/";>ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/";>
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Faster, scalable memory allocations in Apache Arrow with jemalloc
+      <a href="/blog/2018/07/20/jemalloc/" class="permalink" 
title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            20 Jul 2018
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://github.com/xhochy";><i class="fa fa-user"></i> Uwe 
Korn (uwe)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p>With the release of the 0.9 version of Apache Arrow, we have switched our
+default allocator for array buffers from the system allocator to jemalloc on
+OSX and Linux. This applies to the C++/GLib/Python implementations of Arrow.
+In most cases changing the default allocator is normally done to avoid problems
+that occur with many small, frequent (de)allocations. In contrast, in Arrow we
+normally deal with large in-memory datasets. While jemalloc provides good
+strategies for <a 
href="https://zapier.com/engineering/celery-python-jemalloc/";>avoiding RAM 
fragmentation for allocations that are lower than
+a memory page (4kb)</a>, it also provides functionality that improves
+performance on allocations that span several memory pages.</p>
+
+<p>Outside of Apache Arrow, <a 
href="https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/";>jemalloc
 powers the infrastructure of Facebook</a>
+(this is also where most of its development happens). It is also used as the
+<a href="https://github.com/rust-lang/rust/pull/6895";>default allocator in 
Rust</a> as well as it helps <a 
href="http://download.redis.io/redis-stable/README.md";>Redis reduce the memory
+fragmentation on Linux</a> (âAllocatorâ).</p>
+
+<p>One allocation specialty that we require in Arrow is that memory should be
+64byte aligned. This is so that we can get the most performance out of SIMD
+instruction sets like AVX. While the most modern SIMD instructions also work on
+unaligned memory, their performance is much better on aligned memory. To get 
the
+best performance for our analytical applications, we want all memory to be
+allocated such that SIMD performance is maximized.</p>
+
+<p>For aligned allocations, the POSIX APIs only provide the
+<code class="highlighter-rouge">aligned_alloc(void** ptr, size_t alignment, 
size_t size)</code> function to
+allocate aligned memory. There is also 
+<code class="highlighter-rouge">posix_memalign(void **ptr, size_t alignment, 
size_t size)</code> to modify an
+allocation to the preferred alignment. But neither of them cater for expansions
+of the allocation. While the <code class="highlighter-rouge">realloc</code> 
function can often expand allocations
+without moving them physically, it does not ensure that in the case the
+allocation is moved that the alignment is kept.</p>
+
+<p>In the case when Arrow was built without jemalloc being enabled, this 
resulted
+in copying the data on each new expansion of an allocation. To reduce the 
number
+of memory copies, we use jemallocâs <code 
class="highlighter-rouge">*allocx()</code>-APIs to create, modify and free
+aligned allocations. One of the typical tasks where this gives us a major
+speedup is on the incremental construction of an Arrow table that consists of
+several columns. We often donât know the size of the table in advance and 
need
+to expand our allocations as the data is loaded.</p>
+
+<p>To incrementally build a vector using memory expansion of a factor of 2, we
+would use the following C-code with the standard POSIX APIs:</p>
+
+<div class="language-c highlighter-rouge"><pre class="highlight"><code><span 
class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> 
<span class="mi">128</span> <span class="o">*</span> <span 
class="mi">1024</span><span class="p">;</span>
+<span class="kt">void</span><span class="o">*</span> <span 
class="n">ptr</span> <span class="o">=</span> <span 
class="n">aligned_alloc</span><span class="p">(</span><span 
class="mi">64</span><span class="p">,</span> <span class="n">size</span><span 
class="p">);</span>
+<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> 
<span class="n">i</span> <span class="o">=</span> <span 
class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span 
class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span 
class="n">i</span><span class="o">++</span><span class="p">)</span> <span 
class="p">{</span>
+  <span class="kt">size_t</span> <span class="n">new_size</span> <span 
class="o">=</span> <span class="n">size</span> <span class="o">*</span> <span 
class="mi">2</span><span class="p">;</span>
+  <span class="kt">void</span><span class="o">*</span> <span 
class="n">ptr2</span> <span class="o">=</span> <span 
class="n">aligned_alloc</span><span class="p">(</span><span 
class="mi">64</span><span class="p">,</span> <span 
class="n">new_size</span><span class="p">);</span>
+  <span class="n">memcpy</span><span class="p">(</span><span 
class="n">ptr2</span><span class="p">,</span> <span class="n">ptr</span><span 
class="p">,</span> <span class="n">size</span><span class="p">);</span>
+  <span class="n">free</span><span class="p">(</span><span 
class="n">ptr</span><span class="p">);</span>
+  <span class="n">ptr</span> <span class="o">=</span> <span 
class="n">ptr2</span><span class="p">;</span>
+  <span class="n">size</span> <span class="o">=</span> <span 
class="n">new_size</span><span class="p">;</span>
+<span class="p">}</span>
+<span class="n">free</span><span class="p">(</span><span 
class="n">ptr</span><span class="p">);</span>
+</code></pre>
+</div>
+
+<p>With jemallocâs special APIs, we are able to omit the explicit call to 
<code class="highlighter-rouge">memcpy</code>.
+In the case where a memory expansion cannot be done in-place, it is still 
called
+by the allocator but not needed on all occasions. This simplifies our user code
+to:</p>
+
+<div class="language-c highlighter-rouge"><pre class="highlight"><code><span 
class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> 
<span class="mi">128</span> <span class="o">*</span> <span 
class="mi">1024</span><span class="p">;</span>
+<span class="kt">void</span><span class="o">*</span> <span 
class="n">ptr</span> <span class="o">=</span> <span 
class="n">mallocx</span><span class="p">(</span><span 
class="n">size</span><span class="p">,</span> <span 
class="n">MALLOCX_ALIGN</span><span class="p">(</span><span 
class="mi">64</span><span class="p">));</span>
+<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> 
<span class="n">i</span> <span class="o">=</span> <span 
class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span 
class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span 
class="n">i</span><span class="o">++</span><span class="p">)</span> <span 
class="p">{</span>
+  <span class="n">size</span> <span class="o">*=</span> <span 
class="mi">2</span><span class="p">;</span>
+  <span class="n">ptr</span> <span class="o">=</span> <span 
class="n">rallocx</span><span class="p">(</span><span class="n">ptr</span><span 
class="p">,</span> <span class="n">size</span><span class="p">,</span> <span 
class="n">MALLOCX_ALIGN</span><span class="p">(</span><span 
class="mi">64</span><span class="p">));</span>
+<span class="p">}</span>
+<span class="n">dallocx</span><span class="p">(</span><span 
class="n">ptr</span><span class="p">,</span> <span 
class="n">MALLOCX_ALIGN</span><span class="p">(</span><span 
class="mi">64</span><span class="p">));</span>
+</code></pre>
+</div>
+
+<p>To see the real world benefits of using jemalloc, we look at the benchmarks 
in
+Arrow C++. There we have modeled a typical use case of incrementally building 
up
+an array of primitive values. For the build-up of the array, we donât know 
the
+number of elements in the final array so we need to continuously expand the
+memory region in which the data is stored. The code for this benchmark is part
+of the <code class="highlighter-rouge">builder-benchmark</code> in the Arrow 
C++ sources as
+<code class="highlighter-rouge">BuildPrimitiveArrayNoNulls</code>.</p>
+
+<p>Runtimes without <code class="highlighter-rouge">jemalloc</code>:</p>
+
+<div class="highlighter-rouge"><pre 
class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3                 
636726 us   804.114MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 621345 us   824.019MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 625008 us    819.19MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_mean            627693 us   815.774MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_median          625008 us    819.19MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev            8034 us   10.3829MB/s
+</code></pre>
+</div>
+
+<p>Runtimes with <code class="highlighter-rouge">jemalloc</code>:</p>
+
+<div class="highlighter-rouge"><pre 
class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3                 
630881 us   811.563MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 352891 us   1.41687GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 351039 us   1.42434GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_mean            444937 us   1.21125GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_median          352891 us   1.41687GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev          161035 us   371.335MB/s
+</code></pre>
+</div>
+
+<p>The benchmark was run three times for each configuration to see the 
performance
+differences. The first run in each configuration yielded the same performance 
but
+in all subsequent runs, the version using jemalloc was about twice as fast. In
+these cases, the memory region that was used for constructing the array could 
be
+expanded in place without moving the data around. This was possible as there
+were memory pages assigned to the process that were unused but not reclaimed by
+the operating system. Without <code class="highlighter-rouge">jemalloc</code>, 
we cannot make use of them simply by
+the fact that the default allocator has no API that provides aligned
+reallocation.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/blog/index.html
----------------------------------------------------------------------
diff --git a/blog/index.html b/blog/index.html
index ddad423..2ef6dd9 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -124,6 +124,151 @@
     
   <div class="container">
     <h2>
+      Faster, scalable memory allocations in Apache Arrow with jemalloc
+      <a href="/blog/2018/07/20/jemalloc/" class="permalink" 
title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            20 Jul 2018
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://github.com/xhochy";><i class="fa fa-user"></i> Uwe 
Korn (uwe)</a>
+        </div>
+      </div>
+    </div>
+    <!--
+
+-->
+
+<p>With the release of the 0.9 version of Apache Arrow, we have switched our
+default allocator for array buffers from the system allocator to jemalloc on
+OSX and Linux. This applies to the C++/GLib/Python implementations of Arrow.
+In most cases changing the default allocator is normally done to avoid problems
+that occur with many small, frequent (de)allocations. In contrast, in Arrow we
+normally deal with large in-memory datasets. While jemalloc provides good
+strategies for <a 
href="https://zapier.com/engineering/celery-python-jemalloc/";>avoiding RAM 
fragmentation for allocations that are lower than
+a memory page (4kb)</a>, it also provides functionality that improves
+performance on allocations that span several memory pages.</p>
+
+<p>Outside of Apache Arrow, <a 
href="https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/";>jemalloc
 powers the infrastructure of Facebook</a>
+(this is also where most of its development happens). It is also used as the
+<a href="https://github.com/rust-lang/rust/pull/6895";>default allocator in 
Rust</a> as well as it helps <a 
href="http://download.redis.io/redis-stable/README.md";>Redis reduce the memory
+fragmentation on Linux</a> (âAllocatorâ).</p>
+
+<p>One allocation specialty that we require in Arrow is that memory should be
+64byte aligned. This is so that we can get the most performance out of SIMD
+instruction sets like AVX. While the most modern SIMD instructions also work on
+unaligned memory, their performance is much better on aligned memory. To get 
the
+best performance for our analytical applications, we want all memory to be
+allocated such that SIMD performance is maximized.</p>
+
+<p>For aligned allocations, the POSIX APIs only provide the
+<code class="highlighter-rouge">aligned_alloc(void** ptr, size_t alignment, 
size_t size)</code> function to
+allocate aligned memory. There is also 
+<code class="highlighter-rouge">posix_memalign(void **ptr, size_t alignment, 
size_t size)</code> to modify an
+allocation to the preferred alignment. But neither of them cater for expansions
+of the allocation. While the <code class="highlighter-rouge">realloc</code> 
function can often expand allocations
+without moving them physically, it does not ensure that in the case the
+allocation is moved that the alignment is kept.</p>
+
+<p>In the case when Arrow was built without jemalloc being enabled, this 
resulted
+in copying the data on each new expansion of an allocation. To reduce the 
number
+of memory copies, we use jemallocâs <code 
class="highlighter-rouge">*allocx()</code>-APIs to create, modify and free
+aligned allocations. One of the typical tasks where this gives us a major
+speedup is on the incremental construction of an Arrow table that consists of
+several columns. We often donât know the size of the table in advance and 
need
+to expand our allocations as the data is loaded.</p>
+
+<p>To incrementally build a vector using memory expansion of a factor of 2, we
+would use the following C-code with the standard POSIX APIs:</p>
+
+<div class="language-c highlighter-rouge"><pre class="highlight"><code><span 
class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> 
<span class="mi">128</span> <span class="o">*</span> <span 
class="mi">1024</span><span class="p">;</span>
+<span class="kt">void</span><span class="o">*</span> <span 
class="n">ptr</span> <span class="o">=</span> <span 
class="n">aligned_alloc</span><span class="p">(</span><span 
class="mi">64</span><span class="p">,</span> <span class="n">size</span><span 
class="p">);</span>
+<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> 
<span class="n">i</span> <span class="o">=</span> <span 
class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span 
class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span 
class="n">i</span><span class="o">++</span><span class="p">)</span> <span 
class="p">{</span>
+  <span class="kt">size_t</span> <span class="n">new_size</span> <span 
class="o">=</span> <span class="n">size</span> <span class="o">*</span> <span 
class="mi">2</span><span class="p">;</span>
+  <span class="kt">void</span><span class="o">*</span> <span 
class="n">ptr2</span> <span class="o">=</span> <span 
class="n">aligned_alloc</span><span class="p">(</span><span 
class="mi">64</span><span class="p">,</span> <span 
class="n">new_size</span><span class="p">);</span>
+  <span class="n">memcpy</span><span class="p">(</span><span 
class="n">ptr2</span><span class="p">,</span> <span class="n">ptr</span><span 
class="p">,</span> <span class="n">size</span><span class="p">);</span>
+  <span class="n">free</span><span class="p">(</span><span 
class="n">ptr</span><span class="p">);</span>
+  <span class="n">ptr</span> <span class="o">=</span> <span 
class="n">ptr2</span><span class="p">;</span>
+  <span class="n">size</span> <span class="o">=</span> <span 
class="n">new_size</span><span class="p">;</span>
+<span class="p">}</span>
+<span class="n">free</span><span class="p">(</span><span 
class="n">ptr</span><span class="p">);</span>
+</code></pre>
+</div>
+
+<p>With jemallocâs special APIs, we are able to omit the explicit call to 
<code class="highlighter-rouge">memcpy</code>.
+In the case where a memory expansion cannot be done in-place, it is still 
called
+by the allocator but not needed on all occasions. This simplifies our user code
+to:</p>
+
+<div class="language-c highlighter-rouge"><pre class="highlight"><code><span 
class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> 
<span class="mi">128</span> <span class="o">*</span> <span 
class="mi">1024</span><span class="p">;</span>
+<span class="kt">void</span><span class="o">*</span> <span 
class="n">ptr</span> <span class="o">=</span> <span 
class="n">mallocx</span><span class="p">(</span><span 
class="n">size</span><span class="p">,</span> <span 
class="n">MALLOCX_ALIGN</span><span class="p">(</span><span 
class="mi">64</span><span class="p">));</span>
+<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> 
<span class="n">i</span> <span class="o">=</span> <span 
class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span 
class="o">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span 
class="n">i</span><span class="o">++</span><span class="p">)</span> <span 
class="p">{</span>
+  <span class="n">size</span> <span class="o">*=</span> <span 
class="mi">2</span><span class="p">;</span>
+  <span class="n">ptr</span> <span class="o">=</span> <span 
class="n">rallocx</span><span class="p">(</span><span class="n">ptr</span><span 
class="p">,</span> <span class="n">size</span><span class="p">,</span> <span 
class="n">MALLOCX_ALIGN</span><span class="p">(</span><span 
class="mi">64</span><span class="p">));</span>
+<span class="p">}</span>
+<span class="n">dallocx</span><span class="p">(</span><span 
class="n">ptr</span><span class="p">,</span> <span 
class="n">MALLOCX_ALIGN</span><span class="p">(</span><span 
class="mi">64</span><span class="p">));</span>
+</code></pre>
+</div>
+
+<p>To see the real world benefits of using jemalloc, we look at the benchmarks 
in
+Arrow C++. There we have modeled a typical use case of incrementally building 
up
+an array of primitive values. For the build-up of the array, we donât know 
the
+number of elements in the final array so we need to continuously expand the
+memory region in which the data is stored. The code for this benchmark is part
+of the <code class="highlighter-rouge">builder-benchmark</code> in the Arrow 
C++ sources as
+<code class="highlighter-rouge">BuildPrimitiveArrayNoNulls</code>.</p>
+
+<p>Runtimes without <code class="highlighter-rouge">jemalloc</code>:</p>
+
+<div class="highlighter-rouge"><pre 
class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3                 
636726 us   804.114MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 621345 us   824.019MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 625008 us    819.19MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_mean            627693 us   815.774MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_median          625008 us    819.19MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev            8034 us   10.3829MB/s
+</code></pre>
+</div>
+
+<p>Runtimes with <code class="highlighter-rouge">jemalloc</code>:</p>
+
+<div class="highlighter-rouge"><pre 
class="highlight"><code>BM_BuildPrimitiveArrayNoNulls/repeats:3                 
630881 us   811.563MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 352891 us   1.41687GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 351039 us   1.42434GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_mean            444937 us   1.21125GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_median          352891 us   1.41687GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev          161035 us   371.335MB/s
+</code></pre>
+</div>
+
+<p>The benchmark was run three times for each configuration to see the 
performance
+differences. The first run in each configuration yielded the same performance 
but
+in all subsequent runs, the version using jemalloc was about twice as fast. In
+these cases, the memory region that was used for constructing the array could 
be
+expanded in place without moving the data around. This was possible as there
+were memory pages assigned to the process that were unused but not reclaimed by
+the operating system. Without <code class="highlighter-rouge">jemalloc</code>, 
we cannot make use of them simply by
+the fact that the default allocator has no API that provides aligned
+reallocation.</p>
+
+
+  </div>
+
+  
+
+  
+    
+  <div class="container">
+    <h2>
       A Native Go Library for Apache Arrow
       <a href="/blog/2018/03/22/go-code-donation/" class="permalink" 
title="Permalink">â</a>
     </h2>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/docs/ipc.html
----------------------------------------------------------------------
diff --git a/docs/ipc.html b/docs/ipc.html
index 5022c80..00f7817 100644
--- a/docs/ipc.html
+++ b/docs/ipc.html
@@ -366,8 +366,9 @@ tools. Arrow implementations in general are not required to 
implement this data
 format, though we provide a reference implementation in C++.</p>
 
 <p>When writing a standalone encapsulated tensor message, we use the format as
-indicated above, but additionally align the starting offset (if writing to a
-shared memory region) to be a multiple of 8:</p>
+indicated above, but additionally align the starting offset of the metadata as
+well as the starting offset of the tensor body (if writing to a shared memory
+region) to be multiples of 64 bytes:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>&lt;PADDING&gt;
 &lt;metadata size: int32&gt;

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/docs/memory_layout.html
----------------------------------------------------------------------
diff --git a/docs/memory_layout.html b/docs/memory_layout.html
index ff8f9e8..62a383b 100644
--- a/docs/memory_layout.html
+++ b/docs/memory_layout.html
@@ -141,7 +141,7 @@
 
 <h2 id="definitions--terminology">Definitions / Terminology</h2>
 
-<p>Since different projects have used differents words to describe various
+<p>Since different projects have used different words to describe various
 concepts, here is a small glossary to help disambiguate.</p>
 
 <ul>
@@ -406,7 +406,7 @@ values array to 2<sup>31</sup>-1.</li>
 
 <p>The offsets array encodes a start position in the values array, and the 
length
 of the value in each slot is computed using the first difference with the next
-element in the offsets array. For example. the position and length of slot j is
+element in the offsets array. For example, the position and length of slot j is
 computed as:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>slot_position = 
offsets[j]
@@ -444,7 +444,7 @@ logical type.</p>
   * Length: 7,  Null count: 0
   * Null bitmap buffer: Not required
 
-    | Bytes 0-7  | Bytes 8-63  |
+    | Bytes 0-6  | Bytes 7-63  |
     |------------|-------------|
     | joemark    | unspecified |
 </code></pre>
@@ -474,7 +474,7 @@ logical type.</p>
 
   * Offsets buffer (int32)
 
-    | Bytes 0-28           | Bytes 29-63 |
+    | Bytes 0-27           | Bytes 28-63 |
     |----------------------|-------------|
     | 0, 2, 4, 7, 7, 8, 10 | unspecified |
 
@@ -499,7 +499,7 @@ type metadata, not the physical memory layout.</p>
 <p>A struct array does not have any additional allocated physical storage for 
its values.
 A struct array must still have an allocated null bitmap, if it has one or more 
null values.</p>
 
-<p>Physically, a struct type has one child array for each field.</p>
+<p>Physically, a struct type has one child array for each field. The child 
arrays are independent and need not be adjacent to each other in memory.</p>
 
 <p>For example, the struct (field names shown here as strings for illustration
 purposes)</p>
@@ -747,7 +747,7 @@ reinterpreted as a non-nested array.</p>
 <p>Similar to structs, a particular child array may have a non-null slot
 even if the null bitmap of the parent union array indicates the slot is
 null.  Additionally, a child array may have a non-null slot even if
-the the types array indicates that a slot contains a different type at the 
index.</p>
+the types array indicates that a slot contains a different type at the 
index.</p>
 
 <h2 id="dictionary-encoding">Dictionary encoding</h2>
 

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/docs/metadata.html
----------------------------------------------------------------------
diff --git a/docs/metadata.html b/docs/metadata.html
index 858f0c0..78f288e 100644
--- a/docs/metadata.html
+++ b/docs/metadata.html
@@ -194,7 +194,7 @@ the columns. The Flatbuffers IDL for a field is:</p>
 <p>The <code class="highlighter-rouge">type</code> is the logical type of the 
field. Nested types, such as List,
 Struct, and Union, have a sequence of child fields.</p>
 
-<p>a JSON representation of the schema is also provided:
+<p>A JSON representation of the schema is also provided:
 Field:</p>
 <div class="highlighter-rouge"><pre class="highlight"><code><span 
class="p">{</span><span class="w">
   </span><span class="nt">"name"</span><span class="w"> </span><span 
class="p">:</span><span class="w"> </span><span 
class="s2">"name_of_the_field"</span><span class="p">,</span><span class="w">
@@ -510,7 +510,7 @@ according to the child logical type (e.g. <code 
class="highlighter-rouge">List&l
 <p>We specify two logical types for variable length bytes:</p>
 
 <ul>
-  <li><code class="highlighter-rouge">Utf8</code> data is unicode values with 
UTF-8 encoding</li>
+  <li><code class="highlighter-rouge">Utf8</code> data is Unicode values with 
UTF-8 encoding</li>
   <li><code class="highlighter-rouge">Binary</code> is any other variable 
length bytes</li>
 </ul>
 

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index 98d48d5..8ae2618 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,117 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2018-04-09T04:33:24-04:00</updated><id>/</id><entry><title 
type="html">A Native Go Library for Apache Arrow</title><link 
href="/blog/2018/03/22/go-code-donation/" rel="alternate" type="text/html" 
title="A Native Go Library for Apache Arrow" 
/><published>2018-03-22T00:00:00-04:00</published><updated>2018-03-22T00:00:00-04:00</updated><id>/blog/2018/03/22/go-code-donation</id><content
 type="html" xml:base="/blog/2018/03/22/go-code-donation/">&lt;!--
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2018-07-21T12:07:24-04:00</updated><id>/</id><entry><title 
type="html">Faster, scalable memory allocations in Apache Arrow with 
jemalloc</title><link href="/blog/2018/07/20/jemalloc/" rel="alternate" 
type="text/html" title="Faster, scalable memory allocations in Apache Arrow 
with jemalloc" 
/><published>2018-07-20T07:00:00-04:00</published><updated>2018-07-20T07:00:00-04:00</updated><id>/blog/2018/07/20/jemalloc</id><content
 type="html" xml:base="/blog/2018/07/20/jemalloc/">&lt;!--
+
+--&gt;
+
+&lt;p&gt;With the release of the 0.9 version of Apache Arrow, we have switched 
our
+default allocator for array buffers from the system allocator to jemalloc on
+OSX and Linux. This applies to the C++/GLib/Python implementations of Arrow.
+In most cases changing the default allocator is normally done to avoid problems
+that occur with many small, frequent (de)allocations. In contrast, in Arrow we
+normally deal with large in-memory datasets. While jemalloc provides good
+strategies for &lt;a 
href=&quot;https://zapier.com/engineering/celery-python-jemalloc/&quot;&gt;avoiding
 RAM fragmentation for allocations that are lower than
+a memory page (4kb)&lt;/a&gt;, it also provides functionality that improves
+performance on allocations that span several memory pages.&lt;/p&gt;
+
+&lt;p&gt;Outside of Apache Arrow, &lt;a 
href=&quot;https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/&quot;&gt;jemalloc
 powers the infrastructure of Facebook&lt;/a&gt;
+(this is also where most of its development happens). It is also used as the
+&lt;a href=&quot;https://github.com/rust-lang/rust/pull/6895&quot;&gt;default 
allocator in Rust&lt;/a&gt; as well as it helps &lt;a 
href=&quot;http://download.redis.io/redis-stable/README.md&quot;&gt;Redis 
reduce the memory
+fragmentation on Linux&lt;/a&gt; (âAllocatorâ).&lt;/p&gt;
+
+&lt;p&gt;One allocation specialty that we require in Arrow is that memory 
should be
+64byte aligned. This is so that we can get the most performance out of SIMD
+instruction sets like AVX. While the most modern SIMD instructions also work on
+unaligned memory, their performance is much better on aligned memory. To get 
the
+best performance for our analytical applications, we want all memory to be
+allocated such that SIMD performance is maximized.&lt;/p&gt;
+
+&lt;p&gt;For aligned allocations, the POSIX APIs only provide the
+&lt;code class=&quot;highlighter-rouge&quot;&gt;aligned_alloc(void** ptr, 
size_t alignment, size_t size)&lt;/code&gt; function to
+allocate aligned memory. There is also 
+&lt;code class=&quot;highlighter-rouge&quot;&gt;posix_memalign(void **ptr, 
size_t alignment, size_t size)&lt;/code&gt; to modify an
+allocation to the preferred alignment. But neither of them cater for expansions
+of the allocation. While the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;realloc&lt;/code&gt; function can often 
expand allocations
+without moving them physically, it does not ensure that in the case the
+allocation is moved that the alignment is kept.&lt;/p&gt;
+
+&lt;p&gt;In the case when Arrow was built without jemalloc being enabled, this 
resulted
+in copying the data on each new expansion of an allocation. To reduce the 
number
+of memory copies, we use jemallocâs &lt;code 
class=&quot;highlighter-rouge&quot;&gt;*allocx()&lt;/code&gt;-APIs to create, 
modify and free
+aligned allocations. One of the typical tasks where this gives us a major
+speedup is on the incremental construction of an Arrow table that consists of
+several columns. We often donât know the size of the table in advance and 
need
+to expand our allocations as the data is loaded.&lt;/p&gt;
+
+&lt;p&gt;To incrementally build a vector using memory expansion of a factor of 
2, we
+would use the following C-code with the standard POSIX APIs:&lt;/p&gt;
+
+&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;128&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;
+&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;ptr&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;aligned_alloc&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;);&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;{&lt;/span&gt;
+  &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;new_size&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;
+  &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;ptr2&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;aligned_alloc&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;new_size&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;);&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;memcpy&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;ptr2&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;ptr&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;);&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;free&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;ptr&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;);&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;ptr&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;ptr2&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;new_size&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;free&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;ptr&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;);&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;With jemallocâs special APIs, we are able to omit the explicit call 
to &lt;code class=&quot;highlighter-rouge&quot;&gt;memcpy&lt;/code&gt;.
+In the case where a memory expansion cannot be done in-place, it is still 
called
+by the allocator but not needed on all occasions. This simplifies our user code
+to:&lt;/p&gt;
+
+&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;128&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;
+&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;ptr&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;mallocx&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;MALLOCX_ALIGN&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;));&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;{&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;*=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;ptr&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;rallocx&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;ptr&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;MALLOCX_ALIGN&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;));&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;dallocx&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;ptr&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;MALLOCX_ALIGN&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;));&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;To see the real world benefits of using jemalloc, we look at the 
benchmarks in
+Arrow C++. There we have modeled a typical use case of incrementally building 
up
+an array of primitive values. For the build-up of the array, we donât know 
the
+number of elements in the final array so we need to continuously expand the
+memory region in which the data is stored. The code for this benchmark is part
+of the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;builder-benchmark&lt;/code&gt; in the 
Arrow C++ sources as
+&lt;code 
class=&quot;highlighter-rouge&quot;&gt;BuildPrimitiveArrayNoNulls&lt;/code&gt;.&lt;/p&gt;
+
+&lt;p&gt;Runtimes without &lt;code 
class=&quot;highlighter-rouge&quot;&gt;jemalloc&lt;/code&gt;:&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;BM_BuildPrimitiveArrayNoNulls/repeats:3
                 636726 us   804.114MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 621345 us   824.019MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 625008 us    819.19MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_mean            627693 us   815.774MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_median          625008 us    819.19MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev            8034 us   10.3829MB/s
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;Runtimes with &lt;code 
class=&quot;highlighter-rouge&quot;&gt;jemalloc&lt;/code&gt;:&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;BM_BuildPrimitiveArrayNoNulls/repeats:3
                 630881 us   811.563MB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 352891 us   1.41687GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3                 351039 us   1.42434GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_mean            444937 us   1.21125GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_median          352891 us   1.41687GB/s
+BM_BuildPrimitiveArrayNoNulls/repeats:3_stddev          161035 us   371.335MB/s
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;The benchmark was run three times for each configuration to see the 
performance
+differences. The first run in each configuration yielded the same performance 
but
+in all subsequent runs, the version using jemalloc was about twice as fast. In
+these cases, the memory region that was used for constructing the array could 
be
+expanded in place without moving the data around. This was possible as there
+were memory pages assigned to the process that were unused but not reclaimed by
+the operating system. Without &lt;code 
class=&quot;highlighter-rouge&quot;&gt;jemalloc&lt;/code&gt;, we cannot make 
use of them simply by
+the fact that the default allocator has no API that provides aligned
+reallocation.&lt;/p&gt;</content><author><name>uwe</name></author></entry><entry><title
 type="html">A Native Go Library for Apache Arrow</title><link 
href="/blog/2018/03/22/go-code-donation/" rel="alternate" type="text/html" 
title="A Native Go Library for Apache Arrow" 
/><published>2018-03-22T00:00:00-04:00</published><updated>2018-03-22T00:00:00-04:00</updated><id>/blog/2018/03/22/go-code-donation</id><content
 type="html" xml:base="/blog/2018/03/22/go-code-donation/">&lt;!--
 
 --&gt;
 
@@ -1105,87 +1218,4 @@ DataFrame (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/SPARK-20791&qu
 &lt;p&gt;Reaching this first milestone was a group effort from both the Apache 
Arrow and
 Spark communities. Thanks to the hard work of &lt;a 
href=&quot;https://github.com/wesm&quot;&gt;Wes McKinney&lt;/a&gt;, &lt;a 
href=&quot;https://github.com/icexelloss&quot;&gt;Li Jin&lt;/a&gt;,
 &lt;a href=&quot;https://github.com/holdenk&quot;&gt;Holden Karau&lt;/a&gt;, 
Reynold Xin, Wenchen Fan, Shane Knapp and many others that
-helped push this effort 
forwards.&lt;/p&gt;</content><author><name>BryanCutler</name></author></entry><entry><title
 type="html">Apache Arrow 0.5.0 Release</title><link 
href="/blog/2017/07/25/0.5.0-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.5.0 Release" 
/><published>2017-07-25T00:00:00-04:00</published><updated>2017-07-25T00:00:00-04:00</updated><id>/blog/2017/07/25/0.5.0-release</id><content
 type="html" xml:base="/blog/2017/07/25/0.5.0-release/">&lt;!--
-
---&gt;
-
-&lt;p&gt;The Apache Arrow team is pleased to announce the 0.5.0 release. It 
includes
-&lt;a 
href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.5.0&quot;&gt;&lt;strong&gt;130
 resolved JIRAs&lt;/strong&gt;&lt;/a&gt; with some new features, expanded 
integration
-testing between implementations, and bug fixes. The Arrow memory format remains
-stable since the 0.3.x and 0.4.x releases.&lt;/p&gt;
-
-&lt;p&gt;See the &lt;a 
href=&quot;http://arrow.apache.org/install&quot;&gt;Install Page&lt;/a&gt; to 
learn how to get the libraries for your
-platform. The &lt;a 
href=&quot;http://arrow.apache.org/release/0.5.0.html&quot;&gt;complete 
changelog&lt;/a&gt; is also available.&lt;/p&gt;
-
-&lt;h2 id=&quot;expanded-integration-testing&quot;&gt;Expanded Integration 
Testing&lt;/h2&gt;
-
-&lt;p&gt;In this release, we added compatibility tests for dictionary-encoded 
data
-between Java and C++. This enables the distinct values (the 
&lt;em&gt;dictionary&lt;/em&gt;) in a
-vector to be transmitted as part of an Arrow schema while the record batches
-contain integers which correspond to the dictionary.&lt;/p&gt;
-
-&lt;p&gt;So we might have:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;data (string): ['foo', 'bar', 'foo', 
'bar']
-&lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;
-
-&lt;p&gt;In dictionary-encoded form, this could be represented as:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;indices (int8): [0, 1, 0, 1]
-dictionary (string): ['foo', 'bar']
-&lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;
-
-&lt;p&gt;In upcoming releases, we plan to complete integration testing for the 
remaining
-data types (including some more complicated types like unions and decimals) on
-the road to a 1.0.0 release in the future.&lt;/p&gt;
-
-&lt;h2 id=&quot;c-activity&quot;&gt;C++ Activity&lt;/h2&gt;
-
-&lt;p&gt;We completed a number of significant pieces of work in the C++ part 
of Apache
-Arrow.&lt;/p&gt;
-
-&lt;h3 id=&quot;using-jemalloc-as-default-memory-allocator&quot;&gt;Using 
jemalloc as default memory allocator&lt;/h3&gt;
-
-&lt;p&gt;We decided to use &lt;a 
href=&quot;https://github.com/jemalloc/jemalloc&quot;&gt;jemalloc&lt;/a&gt; as 
the default memory allocator unless it is
-explicitly disabled. This memory allocator has significant performance
-advantages in Arrow workloads over the default &lt;code 
class=&quot;highlighter-rouge&quot;&gt;malloc&lt;/code&gt; implementation. We 
will
-publish a blog post going into more detail about this and why you might 
care.&lt;/p&gt;
-
-&lt;h3 id=&quot;sharing-more-c-code-with-apache-parquet&quot;&gt;Sharing more 
C++ code with Apache Parquet&lt;/h3&gt;
-
-&lt;p&gt;We imported the compression library interfaces and dictionary encoding
-algorithms from the &lt;a 
href=&quot;http://github.com/apache/parquet-cpp&quot;&gt;Apache Parquet C++ 
library&lt;/a&gt;. The Parquet library now
-depends on this code in Arrow, and we will be able to use it more easily for
-data compression in Arrow use cases.&lt;/p&gt;
-
-&lt;p&gt;As part of incorporating Parquetâs dictionary encoding utilities, 
we have
-developed an &lt;code 
class=&quot;highlighter-rouge&quot;&gt;arrow::DictionaryBuilder&lt;/code&gt; 
class to enable building
-dictionary-encoded arrays iteratively. This can help save memory and yield
-better performance when interacting with databases, Parquet files, or other
-sources which may have columns having many duplicates.&lt;/p&gt;
-
-&lt;h3 id=&quot;support-for-lz4-and-zstd-compressors&quot;&gt;Support for LZ4 
and ZSTD compressors&lt;/h3&gt;
-
-&lt;p&gt;We added LZ4 and ZSTD compression library support. In ARROW-300 and 
other
-planned work, we intend to add some compression features for data sent via 
RPC.&lt;/p&gt;
-
-&lt;h2 id=&quot;python-activity&quot;&gt;Python Activity&lt;/h2&gt;
-
-&lt;p&gt;We fixed many bugs which were affecting Parquet and Feather users and 
fixed
-several other rough edges with normal Arrow use. We also added some additional
-Arrow type conversions: structs, lists embedded in pandas objects, and Arrow
-time types (which deserialize to the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;datetime.time&lt;/code&gt; 
type).&lt;/p&gt;
-
-&lt;p&gt;In upcoming releases we plan to continue to improve &lt;a 
href=&quot;http://github.com/dask/dask&quot;&gt;Dask&lt;/a&gt; support and
-performance for distributed processing of Apache Parquet files with 
pyarrow.&lt;/p&gt;
-
-&lt;h2 id=&quot;the-road-ahead&quot;&gt;The Road Ahead&lt;/h2&gt;
-
-&lt;p&gt;We have much work ahead of us to build out Arrow integrations in 
other data
-systems to improve their processing performance and interoperability with other
-systems.&lt;/p&gt;
-
-&lt;p&gt;We are discussing the roadmap to a future 1.0.0 release on the &lt;a 
href=&quot;http://mail-archives.apache.org/mod_mbox/arrow-dev/&quot;&gt;developer
-mailing list&lt;/a&gt;. Please join the discussion 
there.&lt;/p&gt;</content><author><name>wesm</name></author></entry></feed>
\ No newline at end of file
+helped push this effort 
forwards.&lt;/p&gt;</content><author><name>BryanCutler</name></author></entry></feed>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/97cc0e5a/powered_by/index.html
----------------------------------------------------------------------
diff --git a/powered_by/index.html b/powered_by/index.html
index 2b2b111..8f67798 100644
--- a/powered_by/index.html
+++ b/powered_by/index.html
@@ -212,8 +212,8 @@ existing Ruby libraries with Apache Arrow. They use Red 
Arrow.</li>
 database management system that helps researchers integrate and
 analyze diverse, multi-dimensional, high resolution data - like
 genomic, clinical, images, sensor, environmental, and IoT data -
-all in one analytical platform. <a 
href="https://github.com/Paradigm4/stream";>SciDB streaming</a> is
-powered by Apache Arrow.</li>
+all in one analytical platform. <a 
href="https://github.com/Paradigm4/stream";>SciDB streaming</a> and
+<a 
href="https://github.com/Paradigm4/accelerated_io_tools";>accelerated_io_tools</a>
 are powered by Apache Arrow.</li>
   <li><strong><a 
href="https://github.com/blue-yonder/turbodbc";>Turbodbc</a>:</strong> Python 
module to access relational databases via the Open
 Database Connectivity (ODBC) interface. It provides the ability to return
 Arrow Tables and RecordBatches in addition to the Python Database API

arrow-site git commit: Publish jemalloc post

Reply via email to