[arrow-site] branch asf-site updated: Updating built site (build f45dd485ccb491ea2681d369a0258f44cc560ac6)

github-bot Wed, 22 Apr 2020 16:56:25 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new e9e47f1  Updating built site (build 
f45dd485ccb491ea2681d369a0258f44cc560ac6)
e9e47f1 is described below

commit e9e47f1c43a9c5c60934c496a9d4e333f65958f2
Author: Neal Richardson <[email protected]>
AuthorDate: Wed Apr 22 23:55:32 2020 +0000

    Updating built site (build f45dd485ccb491ea2681d369a0258f44cc560ac6)
---
 ...manifest-07b3643e10d26ac1b64aff61ff62464d.json} |   2 +-
 blog/2020/04/21/0.17.0-release/index.html          | 496 +++++++++++++++++++++
 blog/index.html                                    |  15 +
 feed.xml                                           | 453 ++++++++++---------
 4 files changed, 757 insertions(+), 209 deletions(-)

diff --git a/assets/.sprockets-manifest-9f55fef5b0b2da26929349fe08192161.json 
b/assets/.sprockets-manifest-07b3643e10d26ac1b64aff61ff62464d.json
similarity index 79%
rename from assets/.sprockets-manifest-9f55fef5b0b2da26929349fe08192161.json
rename to assets/.sprockets-manifest-07b3643e10d26ac1b64aff61ff62464d.json
index ec8cb97..2bf02f9 100644
--- a/assets/.sprockets-manifest-9f55fef5b0b2da26929349fe08192161.json
+++ b/assets/.sprockets-manifest-07b3643e10d26ac1b64aff61ff62464d.json
@@ -1 +1 @@
-{"files":{"main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js":{"logical_path":"main.js","mtime":"2020-04-21T08:23:41-04:00","size":124531,"digest":"18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33","integrity":"sha256-GM0wKVV/c8HuguQRExJ7BPb82ExW2dsMucQOvibvbjM="}},"assets":{"main.js":"main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js"}}
\ No newline at end of file
+{"files":{"main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js":{"logical_path":"main.js","mtime":"2020-04-22T19:55:24-04:00","size":124531,"digest":"18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33","integrity":"sha256-GM0wKVV/c8HuguQRExJ7BPb82ExW2dsMucQOvibvbjM="}},"assets":{"main.js":"main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js"}}
\ No newline at end of file
diff --git a/blog/2020/04/21/0.17.0-release/index.html 
b/blog/2020/04/21/0.17.0-release/index.html
new file mode 100644
index 0000000..cd67589
--- /dev/null
+++ b/blog/2020/04/21/0.17.0-release/index.html
@@ -0,0 +1,496 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <!-- The above meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    
+    <title>Apache Arrow 0.17.0 Release | Apache Arrow</title>
+    
+    
+    <!-- Begin Jekyll SEO tag v2.6.1 -->
+<meta name="generator" content="Jekyll v3.8.4" />
+<meta property="og:title" content="Apache Arrow 0.17.0 Release" />
+<meta name="author" content="pmc" />
+<meta property="og:locale" content="en_US" />
+<meta name="description" content="The Apache Arrow team is pleased to announce 
the 0.17.0 release. This covers over 2 months of development work and includes 
569 resolved issues from 79 distinct contributors. See the Install Page to 
learn how to get the libraries for your platform. The release notes below are 
not exhaustive and only expose selected highlights of the release. Many other 
bugfixes and improvements have been made: we refer you to the complete 
changelog. Community Since the 0 [...]
+<meta property="og:description" content="The Apache Arrow team is pleased to 
announce the 0.17.0 release. This covers over 2 months of development work and 
includes 569 resolved issues from 79 distinct contributors. See the Install 
Page to learn how to get the libraries for your platform. The release notes 
below are not exhaustive and only expose selected highlights of the release. 
Many other bugfixes and improvements have been made: we refer you to the 
complete changelog. Community Sinc [...]
+<link rel="canonical" 
href="https://arrow.apache.org/blog/2020/04/21/0.17.0-release/"; />
+<meta property="og:url" 
content="https://arrow.apache.org/blog/2020/04/21/0.17.0-release/"; />
+<meta property="og:site_name" content="Apache Arrow" />
+<meta property="og:image" content="https://arrow.apache.org/img/arrow.png"; />
+<meta property="og:type" content="article" />
+<meta property="article:published_time" content="2020-04-21T02:00:00-04:00" />
+<meta name="twitter:card" content="summary_large_image" />
+<meta property="twitter:image" 
content="https://arrow.apache.org/img/arrow.png"; />
+<meta property="twitter:title" content="Apache Arrow 0.17.0 Release" />
+<meta name="twitter:site" content="@ApacheArrow" />
+<meta name="twitter:creator" content="@pmc" />
+<script type="application/ld+json">
+{"headline":"Apache Arrow 0.17.0 
Release","dateModified":"2020-04-21T02:00:00-04:00","datePublished":"2020-04-21T02:00:00-04:00","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://arrow.apache.org/img/logo.png"},"name":"pmc"},"@type":"BlogPosting","mainEntityOfPage":{"@type":"WebPage","@id":"https://arrow.apache.org/blog/2020/04/21/0.17.0-release/"},"description":"The
 Apache Arrow team is pleased to announce the 0.17.0 release. This covers over 
2 months of d [...]
+<!-- End Jekyll SEO tag -->
+
+
+    <!-- favicons -->
+    <link rel="icon" type="image/png" sizes="16x16" 
href="/img/favicon-16x16.png" id="light1">
+    <link rel="icon" type="image/png" sizes="32x32" 
href="/img/favicon-32x32.png" id="light2">
+    <link rel="apple-touch-icon" type="image/png" sizes="180x180" 
href="/img/apple-touch-icon.png" id="light3">
+    <link rel="apple-touch-icon" type="image/png" sizes="120x120" 
href="/img/apple-touch-icon-120x120.png" id="light4">
+    <link rel="apple-touch-icon" type="image/png" sizes="76x76" 
href="/img/apple-touch-icon-76x76.png" id="light5">
+    <link rel="apple-touch-icon" type="image/png" sizes="60x60" 
href="/img/apple-touch-icon-60x60.png" id="light6">
+    <!-- dark mode favicons -->
+    <link rel="icon" type="image/png" sizes="16x16" 
href="/img/favicon-16x16-dark.png" id="dark1">
+    <link rel="icon" type="image/png" sizes="32x32" 
href="/img/favicon-32x32-dark.png" id="dark2">
+    <link rel="apple-touch-icon" type="image/png" sizes="180x180" 
href="/img/apple-touch-icon-dark.png" id="dark3">
+    <link rel="apple-touch-icon" type="image/png" sizes="120x120" 
href="/img/apple-touch-icon-120x120-dark.png" id="dark4">
+    <link rel="apple-touch-icon" type="image/png" sizes="76x76" 
href="/img/apple-touch-icon-76x76-dark.png" id="dark5">
+    <link rel="apple-touch-icon" type="image/png" sizes="60x60" 
href="/img/apple-touch-icon-60x60-dark.png" id="dark6">
+
+    <script>
+      // Switch to the dark-mode favicons if prefers-color-scheme: dark
+      function onUpdate() {
+        light1 = document.querySelector('link#light1');
+        light2 = document.querySelector('link#light2');
+        light3 = document.querySelector('link#light3');
+        light4 = document.querySelector('link#light4');
+        light5 = document.querySelector('link#light5');
+        light6 = document.querySelector('link#light6');
+
+        dark1 = document.querySelector('link#dark1');
+        dark2 = document.querySelector('link#dark2');
+        dark3 = document.querySelector('link#dark3');
+        dark4 = document.querySelector('link#dark4');
+        dark5 = document.querySelector('link#dark5');
+        dark6 = document.querySelector('link#dark6');
+
+        if (matcher.matches) {
+          light1.remove();
+          light2.remove();
+          light3.remove();
+          light4.remove();
+          light5.remove();
+          light6.remove();
+          document.head.append(dark1);
+          document.head.append(dark2);
+          document.head.append(dark3);
+          document.head.append(dark4);
+          document.head.append(dark5);
+          document.head.append(dark6);
+        } else {
+          dark1.remove();
+          dark2.remove();
+          dark3.remove();
+          dark4.remove();
+          dark5.remove();
+          dark6.remove();
+          document.head.append(light1);
+          document.head.append(light2);
+          document.head.append(light3);
+          document.head.append(light4);
+          document.head.append(light5);
+          document.head.append(light6);
+        }
+      }
+      matcher = window.matchMedia('(prefers-color-scheme: dark)');
+      matcher.addListener(onUpdate);
+      onUpdate();
+    </script>
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"; 
integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo"
 crossorigin="anonymous"></script>
+    <script 
src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js"; 
integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49"
 crossorigin="anonymous"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+<body class="wrap">
+  <header>
+    <nav class="navbar navbar-expand-md navbar-dark bg-dark">
+  <a class="navbar-brand" href="/"><img src="/img/arrow-inverse-300px.png" 
height="60px"/></a>
+  <button class="navbar-toggler" type="button" data-toggle="collapse" 
data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" 
aria-label="Toggle navigation">
+    <span class="navbar-toggler-icon"></span>
+  </button>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownProjectLinks" role="button" 
data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Project Links
+          </a>
+          <div class="dropdown-menu" 
aria-labelledby="navbarDropdownProjectLinks">
+            <a class="dropdown-item" href="/install/">Installation</a>
+            <a class="dropdown-item" href="/release/">Releases</a>
+            <a class="dropdown-item" href="/faq/">FAQ</a>
+            <a class="dropdown-item" href="/blog/">Blog</a>
+            <a class="dropdown-item" 
href="https://github.com/apache/arrow";>Source Code</a>
+            <a class="dropdown-item" 
href="https://issues.apache.org/jira/browse/ARROW";>Issue Tracker</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownCommunity" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Community
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
+            <a class="dropdown-item" 
href="http://mail-archives.apache.org/mod_mbox/arrow-user/";>User Mailing 
List</a>
+            <a class="dropdown-item" 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Dev Mailing List</a>
+            <a class="dropdown-item" 
href="https://cwiki.apache.org/confluence/display/ARROW";>Developer Wiki</a>
+            <a class="dropdown-item" href="/committers/">Committers</a>
+            <a class="dropdown-item" href="/powered_by/">Powered By</a>
+          </div>
+        </li>
+        <li class="nav-item">
+          <a class="nav-link" href="/docs/format/Columnar.html"
+             role="button" aria-haspopup="true" aria-expanded="false">
+             Specification
+          </a>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownDocumentation" role="button" 
data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Documentation
+          </a>
+          <div class="dropdown-menu" 
aria-labelledby="navbarDropdownDocumentation">
+            <a class="dropdown-item" href="/docs">Project Docs</a>
+            <a class="dropdown-item" href="/docs/python">Python</a>
+            <a class="dropdown-item" href="/docs/cpp">C++</a>
+            <a class="dropdown-item" href="/docs/java">Java</a>
+            <a class="dropdown-item" href="/docs/c_glib">C GLib</a>
+            <a class="dropdown-item" href="/docs/js">JavaScript</a>
+            <a class="dropdown-item" href="/docs/r">R</a>
+          </div>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownASF" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             ASF Links
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownASF">
+            <a class="dropdown-item" href="http://www.apache.org/";>ASF 
Website</a>
+            <a class="dropdown-item" 
href="http://www.apache.org/licenses/";>License</a>
+            <a class="dropdown-item" 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a>
+            <a class="dropdown-item" 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a>
+            <a class="dropdown-item" 
href="http://www.apache.org/security/";>Security</a>
+          </div>
+        </li>
+      </ul>
+      <div class="flex-row justify-content-end ml-md-auto">
+        <a class="d-sm-none d-md-inline pr-2" 
href="https://www.apache.org/events/current-event.html";>
+          <img src="https://www.apache.org/events/current-event-234x60.png"/>
+        </a>
+        <a href="http://www.apache.org/";>
+          <img src="/img/asf_logo.svg" width="120px"/>
+        </a>
+      </div>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+  </header>
+
+  <div class="container p-lg-4">
+    <main role="main">
+    
+    
+    
+<h1>
+  Apache Arrow 0.17.0 Release
+</h1>
+
+
+
+<p>
+  <span class="badge badge-secondary">Published</span>
+  <span class="published">
+    21 Apr 2020
+  </span>
+  <br />
+  <span class="badge badge-secondary">By</span>
+  
+    <a href="https://arrow.apache.org";>The Apache Arrow PMC (pmc) </a>
+  
+
+  
+</p>
+
+
+    <!--
+
+-->
+
+<p>The Apache Arrow team is pleased to announce the 0.17.0 release. This covers
+over 2 months of development work and includes <a 
href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.17.0";><strong>569
 resolved issues</strong></a>
+from <a 
href="https://arrow.apache.org/release/0.17.0.html#contributors";><strong>79 
distinct contributors</strong></a>. See the Install Page to learn how to
+get the libraries for your platform.</p>
+
+<p>The release notes below are not exhaustive and only expose selected 
highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the <a href="https://arrow.apache.org/release/0.17.0.html";>complete 
changelog</a>.</p>
+
+<h2 id="community">Community</h2>
+
+<p>Since the 0.16.0 release, two committers have joined the Project Management
+Committee (PMC):</p>
+
+<ul>
+  <li><a href="https://github.com/nealrichardson";>Neal Richardson</a></li>
+  <li><a href="https://github.com/fsaintjacques";>François 
Saint-Jacques</a></li>
+</ul>
+
+<p>Thank you for all your contributions!</p>
+
+<h2 id="columnar-format-notes">Columnar Format Notes</h2>
+
+<p>A <a 
href="https://arrow.apache.org/docs/format/CDataInterface.html";>C-level Data 
Interface</a> was designed to ease data sharing inside a single
+process. It allows different runtimes or libraries to share Arrow data using a
+well-known binary layout and metadata representation, without any copies. Third
+party libraries can use the C interface to import and export the Arrow columnar
+format in-process without requiring on any new code dependencies.</p>
+
+<p>The C++ library now includes an implementation of the C Data Interface, and
+Python and R have bindings to that implementation.</p>
+
+<h2 id="arrow-flight-rpc-notes">Arrow Flight RPC notes</h2>
+
+<ul>
+  <li>Adopted new DoExchange bi-directional data RPC</li>
+  <li>ListFlights supports being passed a Criteria argument in
+Java/C++/Python. This allows applications to search for flights satisfying a
+given query.</li>
+  <li>Custom metadata can be attached to errors that the server sends to the
+client, which can be used to encode richer application-specific 
information.</li>
+  <li>A number of minor bugs were fixed, including proper handling of empty 
null
+arrays in Java and round-tripping of certain Arrow status codes in
+C++/Python.</li>
+</ul>
+
+<h2 id="c-notes">C++ notes</h2>
+
+<h3 id="feather-v2">Feather V2</h3>
+
+<p>The “Feather V2” format based on the Arrow IPC file format was developed.
+Feather V2 features full support for all Arrow data types, and resolves the 2GB
+per-column limitation for large amounts of string data that the <a 
href="https://github.com/wesm/feather";>original
+Feather implementation</a> had.  Feather V2 also introduces experimental IPC
+message compression using LZ4 frame format or ZSTD. This will be formalized
+later in the Arrow format.</p>
+
+<h3 id="c-datasets">C++ Datasets</h3>
+
+<ul>
+  <li>Improve speed on high latency file system by relaxing discovery 
validation</li>
+  <li>Better performance with Arrow IPC files using column projection</li>
+  <li>Add the ability to list files in FileSystemDataset</li>
+  <li>Add support for Parquet file reader options</li>
+  <li>Support dictionary columns in partition expression</li>
+  <li>Fix various crashes and other issues</li>
+</ul>
+
+<h3 id="c-parquet-notes">C++ Parquet notes</h3>
+
+<ul>
+  <li>Complete support for writing nested types to Parquet format was
+completed. The legacy code can be accessed through parquet write option C++
+and an environment variable in Python. Read support will come in a future
+release.</li>
+  <li>The BYTE_STREAM_SPLIT encoding was implemented for floating-point types. 
It
+helps improve the efficiency of memory compression for high-entropy data.</li>
+  <li>Expose Parquet schema field_id as Arrow field metadata</li>
+  <li>Support for DataPageV2 data page format</li>
+</ul>
+
+<h3 id="c-build-notes">C++ build notes</h3>
+
+<ul>
+  <li>We continued to make the core C++ library build simpler and faster. 
Among the
+improvements are the removal of the dependency on Thrift IDL compiler at
+build time; while Parquet still requires the Thrift runtime C++ library, its
+dependencies are much lighter. We also further reduced the number of build
+configurations that require Boost, and when Boost is needed to be built, we
+only download the components we need, reducing the size of the Boost bundle
+by 90%.</li>
+  <li>Improved support for building on ARM platforms</li>
+  <li>Upgraded LLVM version from 7 to 8</li>
+  <li>Simplified SIMD build configuration with ARROW_SIMD_LEVEL option 
allowing no
+SIMD, SSE4.2, AVX2, or AVX512 to be selected.</li>
+  <li>Fixed a number of bugs affecting compilation on aarch64 platforms</li>
+</ul>
+
+<h3 id="other-c-notes">Other C++ notes</h3>
+
+<ul>
+  <li>Many crashes on invalid input detected by <a 
href="https://google.github.io/oss-fuzz/";>OSS-Fuzz</a> in the IPC reader and
+in Parquet-Arrow reading were fixed. See our recent <a 
href="https://arrow.apache.org/blog/2020/03/31/fuzzing-arrow-ipc/";>blog 
post</a> for more
+details.</li>
+  <li>A “Device” abstraction was added to simplify buffer management and 
movement
+across heterogeneous hardware configurations, e.g. CPUs and GPUs.</li>
+  <li>A streaming CSV reader was implemented, yielding individual 
RecordBatches and
+helping limit overall memory occupation.</li>
+  <li>Array casting from Decimal128 to integer types and to Decimal128 with
+different scale/precision was added.</li>
+  <li>Sparse CSF tensors are now supported.</li>
+  <li>When creating an Array, the null bitmap is not kept if the null count is 
known to be zero</li>
+  <li>Compressor support for the LZ4 frame format (LZ4_FRAME) was added</li>
+  <li>An event-driven interface for reading IPC streams was added.</li>
+  <li>Further core APIs that required passing an explicit out-parameter were
+migrated to <code class="highlighter-rouge">Result&lt;T&gt;</code>.</li>
+  <li>New analytics kernels for match, sort indices / argsort, top-k</li>
+</ul>
+
+<h2 id="java-notes">Java notes</h2>
+
+<ul>
+  <li>Netty dependencies were removed for BufferAllocator and ReferenceManager
+classes. In the future, we plan to move netty related classes to a separate
+module.</li>
+  <li>New features were provided to support efficiently appending vector/vector
+schema root values in batch.</li>
+  <li>Comparing a range of values in dense union vectors has been 
supported.</li>
+  <li>The quick sort algorithm was improved to avoid degenerating to the worst 
case.</li>
+</ul>
+
+<h2 id="python-notes">Python notes</h2>
+
+<h3 id="datasets">Datasets</h3>
+
+<ul>
+  <li>Updated <code class="highlighter-rouge">pyarrow.dataset</code> module 
following the changes in the C++ Datasets
+project. This release also adds <a 
href="https://arrow.apache.org/docs/python/dataset.html";>richer 
documentation</a> on the datasets
+module.</li>
+  <li>Support for the improved dataset functionality in
+<code 
class="highlighter-rouge">pyarrow.parquet.read_table/ParquetDataset</code>. To 
enable, pass
+<code class="highlighter-rouge">use_legacy_dataset=False</code>. Among other 
things, this allows to specify filters
+for all columns and not only the partition keys (using row group statistics)
+and enables different partitioning schemes. See the “note” in the
+<a 
href="https://arrow.apache.org/docs/python/parquet.html#reading-from-partitioned-datasets";><code
 class="highlighter-rouge">ParquetDataset</code> documentation</a>.</li>
+</ul>
+
+<h3 id="packaging">Packaging</h3>
+
+<ul>
+  <li>Wheels for Python 3.8 are now available</li>
+  <li>Support for Python 2.7 has been dropped as Python 2.x reached 
end-of-life in
+January 2020.</li>
+  <li>Nightly wheels and conda packages are now available for testing or other
+development purposes. See the <a 
href="https://arrow.apache.org/docs/python/install.html#installing-nightly-packages";>installation
 guide</a></li>
+</ul>
+
+<h3 id="other-improvements">Other improvements</h3>
+
+<ul>
+  <li>Conversion to numpy/pandas for FixedSizeList, LargeString, 
LargeBinary</li>
+  <li>Sparse CSC matrices and Sparse CSF tensors support was added. 
(ARROW-7419,
+ARROW-7427)</li>
+</ul>
+
+<h2 id="r-notes">R notes</h2>
+
+<p>Highlights include support for the Feather V2 format and the C Data 
Interface,
+both described above. Along with low-level bindings for the C interface, this
+release adds tooling to work with Arrow data in Python using <code 
class="highlighter-rouge">reticulate</code>. See
+<a href="https://arrow.apache.org/docs/r/articles/python.html";><code 
class="highlighter-rouge">vignette("python", package = "arrow")</code></a> for 
a guide to getting started.</p>
+
+<p>Installation on Linux now builds C++ the library from source by default. 
For a
+faster, richer build, set the environment variable <code 
class="highlighter-rouge">NOT_CRAN=true</code>. See
+<a href="https://arrow.apache.org/docs/r/articles/install.html";><code 
class="highlighter-rouge">vignette("install", package = "arrow")</code></a> for 
details and more options.</p>
+
+<p>For more on what’s in the 0.17 R package, see the <a 
href="https://arrow.apache.org/docs/r/news/";>R changelog</a>.</p>
+
+<h2 id="ruby-and-c-glib-notes">Ruby and C GLib notes</h2>
+
+<h3 id="ruby">Ruby</h3>
+
+<ul>
+  <li>Support Ruby 2.3 again</li>
+</ul>
+
+<h3 id="c-glib">C GLib</h3>
+
+<ul>
+  <li>Add GArrowRecordBatchIterator</li>
+  <li>Add support for GArrowFilterOptions</li>
+  <li>Add support for Peek() to GIOInputStream</li>
+  <li>Add some metadata bindings to GArrowSchema</li>
+  <li>Add LocalFileSystem support</li>
+  <li>Add support for writer properties of Parquet</li>
+  <li>Add support for MapArray</li>
+  <li>Add support for BooleanNode</li>
+</ul>
+
+<h2 id="rust-notes">Rust notes</h2>
+
+<ul>
+  <li>DictionayArray support.</li>
+  <li>Various improvements to code safety.</li>
+  <li>Filter kernel now supports temporal types.</li>
+</ul>
+
+<h3 id="rust-parquet-notes">Rust Parquet notes</h3>
+
+<ul>
+  <li>Array reader now supports temporal types.</li>
+  <li>Parquet writer now supports custom meta-data key/value pairs.</li>
+</ul>
+
+<h3 id="rust-datafusion-notes">Rust DataFusion notes</h3>
+
+<ul>
+  <li>Logical plans can now reference columns by name (as well as by index) 
using
+the new <code class="highlighter-rouge">UnresolvedColumn</code> expression. 
There is a new optimizer rule to
+resolve these into column indices.</li>
+  <li>Scalar UDFs can now be registered with the execution context and used 
from
+logical query plans as well as from SQL. A number of math scalar functions
+have been implemented using this feature (sqrt, cos, sin, tan, asin, acos,
+atan, floor, ceil, round, trunc, abs, signum, exp, log, log2, log10).</li>
+  <li>Various SQL improvements, including support for <code 
class="highlighter-rouge">SELECT *</code> and <code 
class="highlighter-rouge">SELECT
+COUNT(*)</code>, and improvements to parsing of aggregate queries.</li>
+  <li>Flight examples are provided, with a client that sends a SQL statement 
to a
+Flight server and receives the results.</li>
+  <li>The interactive SQL command-line tool now has improved documentation and
+better formatting of query results.</li>
+</ul>
+
+<h2 id="project-operations">Project Operations</h2>
+
+<p>We’ve continued our migration of general automation toward GitHub Actions. 
The
+majority of our commit-by-commit continuous integration (CI) is now running on
+GitHub Actions. We are working on different solutions for using dedicated
+hardware as part of our CI. The <a href="https://buildkite.com/";>Buildkite</a> 
self-hosted CI/CD platform is
+now supported on Apache repositories and GitHub Actions also supports
+self-hosted workers.</p>
+
+
+    </main>
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2016-2019 The Apache Software Foundation</p>
+  <script integrity="sha256-GM0wKVV/c8HuguQRExJ7BPb82ExW2dsMucQOvibvbjM=" 
crossorigin="anonymous" type="text/javascript" 
src="/assets/main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js"></script>
+</footer>
+
+  </div>
+</body>
+</html>
diff --git a/blog/index.html b/blog/index.html
index 40ecefa..1056b3a 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -217,6 +217,21 @@
   
   <p>
     <h3>
+      <a href="/blog/2020/04/21/0.17.0-release/">Apache Arrow 0.17.0 
Release</a>
+    </h3>
+    
+    <p>
+    <span class="blog-list-date">
+      21 April 2020
+    </span>
+    </p>
+    The Apache Arrow team is pleased to announce the 0.17.0 release. This 
covers over 2 months of development work and includes 569 resolved issues from 
79 distinct contributors. See the Install Page to learn how to get the 
libraries for your platform. The release notes below are not exhaustive and...
+  </p>
+  
+
+  
+  <p>
+    <h3>
       <a href="/blog/2020/03/31/fuzzing-arrow-ipc/">Fuzzing the Arrow C++ IPC 
implementation</a>
     </h3>
     
diff --git a/feed.xml b/feed.xml
index 04feb74..b13ac67 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,247 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.8.4">Jekyll</generator><link 
href="https://arrow.apache.org/feed.xml"; rel="self" type="application/atom+xml" 
/><link href="https://arrow.apache.org/"; rel="alternate" type="text/html" 
/><updated>2020-04-21T08:23:33-04:00</updated><id>https://arrow.apache.org/feed.xml</id><title
 type="html">Apache Arrow</title><subtitle>Apache Arrow is a cross-language 
developm [...]
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.8.4">Jekyll</generator><link 
href="https://arrow.apache.org/feed.xml"; rel="self" type="application/atom+xml" 
/><link href="https://arrow.apache.org/"; rel="alternate" type="text/html" 
/><updated>2020-04-22T19:55:16-04:00</updated><id>https://arrow.apache.org/feed.xml</id><title
 type="html">Apache Arrow</title><subtitle>Apache Arrow is a cross-language 
developm [...]
+
+--&gt;
+
+&lt;p&gt;The Apache Arrow team is pleased to announce the 0.17.0 release. This 
covers
+over 2 months of development work and includes &lt;a 
href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.17.0&quot;&gt;&lt;strong&gt;569
 resolved issues&lt;/strong&gt;&lt;/a&gt;
+from &lt;a 
href=&quot;https://arrow.apache.org/release/0.17.0.html#contributors&quot;&gt;&lt;strong&gt;79
 distinct contributors&lt;/strong&gt;&lt;/a&gt;. See the Install Page to learn 
how to
+get the libraries for your platform.&lt;/p&gt;
+
+&lt;p&gt;The release notes below are not exhaustive and only expose selected 
highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the &lt;a 
href=&quot;https://arrow.apache.org/release/0.17.0.html&quot;&gt;complete 
changelog&lt;/a&gt;.&lt;/p&gt;
+
+&lt;h2 id=&quot;community&quot;&gt;Community&lt;/h2&gt;
+
+&lt;p&gt;Since the 0.16.0 release, two committers have joined the Project 
Management
+Committee (PMC):&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/nealrichardson&quot;&gt;Neal 
Richardson&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/fsaintjacques&quot;&gt;François 
Saint-Jacques&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Thank you for all your contributions!&lt;/p&gt;
+
+&lt;h2 id=&quot;columnar-format-notes&quot;&gt;Columnar Format Notes&lt;/h2&gt;
+
+&lt;p&gt;A &lt;a 
href=&quot;https://arrow.apache.org/docs/format/CDataInterface.html&quot;&gt;C-level
 Data Interface&lt;/a&gt; was designed to ease data sharing inside a single
+process. It allows different runtimes or libraries to share Arrow data using a
+well-known binary layout and metadata representation, without any copies. Third
+party libraries can use the C interface to import and export the Arrow columnar
+format in-process without requiring on any new code dependencies.&lt;/p&gt;
+
+&lt;p&gt;The C++ library now includes an implementation of the C Data 
Interface, and
+Python and R have bindings to that implementation.&lt;/p&gt;
+
+&lt;h2 id=&quot;arrow-flight-rpc-notes&quot;&gt;Arrow Flight RPC 
notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Adopted new DoExchange bi-directional data RPC&lt;/li&gt;
+  &lt;li&gt;ListFlights supports being passed a Criteria argument in
+Java/C++/Python. This allows applications to search for flights satisfying a
+given query.&lt;/li&gt;
+  &lt;li&gt;Custom metadata can be attached to errors that the server sends to 
the
+client, which can be used to encode richer application-specific 
information.&lt;/li&gt;
+  &lt;li&gt;A number of minor bugs were fixed, including proper handling of 
empty null
+arrays in Java and round-tripping of certain Arrow status codes in
+C++/Python.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;c-notes&quot;&gt;C++ notes&lt;/h2&gt;
+
+&lt;h3 id=&quot;feather-v2&quot;&gt;Feather V2&lt;/h3&gt;
+
+&lt;p&gt;The “Feather V2” format based on the Arrow IPC file format was 
developed.
+Feather V2 features full support for all Arrow data types, and resolves the 2GB
+per-column limitation for large amounts of string data that the &lt;a 
href=&quot;https://github.com/wesm/feather&quot;&gt;original
+Feather implementation&lt;/a&gt; had.  Feather V2 also introduces experimental 
IPC
+message compression using LZ4 frame format or ZSTD. This will be formalized
+later in the Arrow format.&lt;/p&gt;
+
+&lt;h3 id=&quot;c-datasets&quot;&gt;C++ Datasets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Improve speed on high latency file system by relaxing discovery 
validation&lt;/li&gt;
+  &lt;li&gt;Better performance with Arrow IPC files using column 
projection&lt;/li&gt;
+  &lt;li&gt;Add the ability to list files in FileSystemDataset&lt;/li&gt;
+  &lt;li&gt;Add support for Parquet file reader options&lt;/li&gt;
+  &lt;li&gt;Support dictionary columns in partition expression&lt;/li&gt;
+  &lt;li&gt;Fix various crashes and other issues&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;c-parquet-notes&quot;&gt;C++ Parquet notes&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Complete support for writing nested types to Parquet format was
+completed. The legacy code can be accessed through parquet write option C++
+and an environment variable in Python. Read support will come in a future
+release.&lt;/li&gt;
+  &lt;li&gt;The BYTE_STREAM_SPLIT encoding was implemented for floating-point 
types. It
+helps improve the efficiency of memory compression for high-entropy 
data.&lt;/li&gt;
+  &lt;li&gt;Expose Parquet schema field_id as Arrow field metadata&lt;/li&gt;
+  &lt;li&gt;Support for DataPageV2 data page format&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;c-build-notes&quot;&gt;C++ build notes&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;We continued to make the core C++ library build simpler and 
faster. Among the
+improvements are the removal of the dependency on Thrift IDL compiler at
+build time; while Parquet still requires the Thrift runtime C++ library, its
+dependencies are much lighter. We also further reduced the number of build
+configurations that require Boost, and when Boost is needed to be built, we
+only download the components we need, reducing the size of the Boost bundle
+by 90%.&lt;/li&gt;
+  &lt;li&gt;Improved support for building on ARM platforms&lt;/li&gt;
+  &lt;li&gt;Upgraded LLVM version from 7 to 8&lt;/li&gt;
+  &lt;li&gt;Simplified SIMD build configuration with ARROW_SIMD_LEVEL option 
allowing no
+SIMD, SSE4.2, AVX2, or AVX512 to be selected.&lt;/li&gt;
+  &lt;li&gt;Fixed a number of bugs affecting compilation on aarch64 
platforms&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;other-c-notes&quot;&gt;Other C++ notes&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Many crashes on invalid input detected by &lt;a 
href=&quot;https://google.github.io/oss-fuzz/&quot;&gt;OSS-Fuzz&lt;/a&gt; in 
the IPC reader and
+in Parquet-Arrow reading were fixed. See our recent &lt;a 
href=&quot;https://arrow.apache.org/blog/2020/03/31/fuzzing-arrow-ipc/&quot;&gt;blog
 post&lt;/a&gt; for more
+details.&lt;/li&gt;
+  &lt;li&gt;A “Device” abstraction was added to simplify buffer management and 
movement
+across heterogeneous hardware configurations, e.g. CPUs and GPUs.&lt;/li&gt;
+  &lt;li&gt;A streaming CSV reader was implemented, yielding individual 
RecordBatches and
+helping limit overall memory occupation.&lt;/li&gt;
+  &lt;li&gt;Array casting from Decimal128 to integer types and to Decimal128 
with
+different scale/precision was added.&lt;/li&gt;
+  &lt;li&gt;Sparse CSF tensors are now supported.&lt;/li&gt;
+  &lt;li&gt;When creating an Array, the null bitmap is not kept if the null 
count is known to be zero&lt;/li&gt;
+  &lt;li&gt;Compressor support for the LZ4 frame format (LZ4_FRAME) was 
added&lt;/li&gt;
+  &lt;li&gt;An event-driven interface for reading IPC streams was 
added.&lt;/li&gt;
+  &lt;li&gt;Further core APIs that required passing an explicit out-parameter 
were
+migrated to &lt;code 
class=&quot;highlighter-rouge&quot;&gt;Result&amp;lt;T&amp;gt;&lt;/code&gt;.&lt;/li&gt;
+  &lt;li&gt;New analytics kernels for match, sort indices / argsort, 
top-k&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;java-notes&quot;&gt;Java notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Netty dependencies were removed for BufferAllocator and 
ReferenceManager
+classes. In the future, we plan to move netty related classes to a separate
+module.&lt;/li&gt;
+  &lt;li&gt;New features were provided to support efficiently appending 
vector/vector
+schema root values in batch.&lt;/li&gt;
+  &lt;li&gt;Comparing a range of values in dense union vectors has been 
supported.&lt;/li&gt;
+  &lt;li&gt;The quick sort algorithm was improved to avoid degenerating to the 
worst case.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;python-notes&quot;&gt;Python notes&lt;/h2&gt;
+
+&lt;h3 id=&quot;datasets&quot;&gt;Datasets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Updated &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.dataset&lt;/code&gt; module 
following the changes in the C++ Datasets
+project. This release also adds &lt;a 
href=&quot;https://arrow.apache.org/docs/python/dataset.html&quot;&gt;richer 
documentation&lt;/a&gt; on the datasets
+module.&lt;/li&gt;
+  &lt;li&gt;Support for the improved dataset functionality in
+&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.parquet.read_table/ParquetDataset&lt;/code&gt;.
 To enable, pass
+&lt;code 
class=&quot;highlighter-rouge&quot;&gt;use_legacy_dataset=False&lt;/code&gt;. 
Among other things, this allows to specify filters
+for all columns and not only the partition keys (using row group statistics)
+and enables different partitioning schemes. See the “note” in the
+&lt;a 
href=&quot;https://arrow.apache.org/docs/python/parquet.html#reading-from-partitioned-datasets&quot;&gt;&lt;code
 class=&quot;highlighter-rouge&quot;&gt;ParquetDataset&lt;/code&gt; 
documentation&lt;/a&gt;.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;packaging&quot;&gt;Packaging&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Wheels for Python 3.8 are now available&lt;/li&gt;
+  &lt;li&gt;Support for Python 2.7 has been dropped as Python 2.x reached 
end-of-life in
+January 2020.&lt;/li&gt;
+  &lt;li&gt;Nightly wheels and conda packages are now available for testing or 
other
+development purposes. See the &lt;a 
href=&quot;https://arrow.apache.org/docs/python/install.html#installing-nightly-packages&quot;&gt;installation
 guide&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;other-improvements&quot;&gt;Other improvements&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Conversion to numpy/pandas for FixedSizeList, LargeString, 
LargeBinary&lt;/li&gt;
+  &lt;li&gt;Sparse CSC matrices and Sparse CSF tensors support was added. 
(ARROW-7419,
+ARROW-7427)&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;r-notes&quot;&gt;R notes&lt;/h2&gt;
+
+&lt;p&gt;Highlights include support for the Feather V2 format and the C Data 
Interface,
+both described above. Along with low-level bindings for the C interface, this
+release adds tooling to work with Arrow data in Python using &lt;code 
class=&quot;highlighter-rouge&quot;&gt;reticulate&lt;/code&gt;. See
+&lt;a 
href=&quot;https://arrow.apache.org/docs/r/articles/python.html&quot;&gt;&lt;code
 class=&quot;highlighter-rouge&quot;&gt;vignette(&quot;python&quot;, package = 
&quot;arrow&quot;)&lt;/code&gt;&lt;/a&gt; for a guide to getting 
started.&lt;/p&gt;
+
+&lt;p&gt;Installation on Linux now builds C++ the library from source by 
default. For a
+faster, richer build, set the environment variable &lt;code 
class=&quot;highlighter-rouge&quot;&gt;NOT_CRAN=true&lt;/code&gt;. See
+&lt;a 
href=&quot;https://arrow.apache.org/docs/r/articles/install.html&quot;&gt;&lt;code
 class=&quot;highlighter-rouge&quot;&gt;vignette(&quot;install&quot;, package = 
&quot;arrow&quot;)&lt;/code&gt;&lt;/a&gt; for details and more 
options.&lt;/p&gt;
+
+&lt;p&gt;For more on what’s in the 0.17 R package, see the &lt;a 
href=&quot;https://arrow.apache.org/docs/r/news/&quot;&gt;R 
changelog&lt;/a&gt;.&lt;/p&gt;
+
+&lt;h2 id=&quot;ruby-and-c-glib-notes&quot;&gt;Ruby and C GLib notes&lt;/h2&gt;
+
+&lt;h3 id=&quot;ruby&quot;&gt;Ruby&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Support Ruby 2.3 again&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;c-glib&quot;&gt;C GLib&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Add GArrowRecordBatchIterator&lt;/li&gt;
+  &lt;li&gt;Add support for GArrowFilterOptions&lt;/li&gt;
+  &lt;li&gt;Add support for Peek() to GIOInputStream&lt;/li&gt;
+  &lt;li&gt;Add some metadata bindings to GArrowSchema&lt;/li&gt;
+  &lt;li&gt;Add LocalFileSystem support&lt;/li&gt;
+  &lt;li&gt;Add support for writer properties of Parquet&lt;/li&gt;
+  &lt;li&gt;Add support for MapArray&lt;/li&gt;
+  &lt;li&gt;Add support for BooleanNode&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;rust-notes&quot;&gt;Rust notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;DictionayArray support.&lt;/li&gt;
+  &lt;li&gt;Various improvements to code safety.&lt;/li&gt;
+  &lt;li&gt;Filter kernel now supports temporal types.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;rust-parquet-notes&quot;&gt;Rust Parquet notes&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Array reader now supports temporal types.&lt;/li&gt;
+  &lt;li&gt;Parquet writer now supports custom meta-data key/value 
pairs.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;rust-datafusion-notes&quot;&gt;Rust DataFusion notes&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Logical plans can now reference columns by name (as well as by 
index) using
+the new &lt;code 
class=&quot;highlighter-rouge&quot;&gt;UnresolvedColumn&lt;/code&gt; 
expression. There is a new optimizer rule to
+resolve these into column indices.&lt;/li&gt;
+  &lt;li&gt;Scalar UDFs can now be registered with the execution context and 
used from
+logical query plans as well as from SQL. A number of math scalar functions
+have been implemented using this feature (sqrt, cos, sin, tan, asin, acos,
+atan, floor, ceil, round, trunc, abs, signum, exp, log, log2, 
log10).&lt;/li&gt;
+  &lt;li&gt;Various SQL improvements, including support for &lt;code 
class=&quot;highlighter-rouge&quot;&gt;SELECT *&lt;/code&gt; and &lt;code 
class=&quot;highlighter-rouge&quot;&gt;SELECT
+COUNT(*)&lt;/code&gt;, and improvements to parsing of aggregate 
queries.&lt;/li&gt;
+  &lt;li&gt;Flight examples are provided, with a client that sends a SQL 
statement to a
+Flight server and receives the results.&lt;/li&gt;
+  &lt;li&gt;The interactive SQL command-line tool now has improved 
documentation and
+better formatting of query results.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;project-operations&quot;&gt;Project Operations&lt;/h2&gt;
+
+&lt;p&gt;We’ve continued our migration of general automation toward GitHub 
Actions. The
+majority of our commit-by-commit continuous integration (CI) is now running on
+GitHub Actions. We are working on different solutions for using dedicated
+hardware as part of our CI. The &lt;a 
href=&quot;https://buildkite.com/&quot;&gt;Buildkite&lt;/a&gt; self-hosted 
CI/CD platform is
+now supported on Apache repositories and GitHub Actions also supports
+self-hosted 
workers.&lt;/p&gt;</content><author><name>pmc</name></author><summary 
type="html">The Apache Arrow team is pleased to announce the 0.17.0 release. 
This covers over 2 months of development work and includes 569 resolved issues 
from 79 distinct contributors. See the Install Page to learn how to get the 
libraries for your platform. The release notes below are not exhaustive and 
only expose selected highlights of the release. Many other bugfixes and 
improvements have been made: w [...]
 
 --&gt;
 
@@ -1780,210 +2023,4 @@ for C++&lt;/li&gt;
 data messaging use cases&lt;/li&gt;
   &lt;li&gt;&lt;strong&gt;Arrow Columnar Format evolution&lt;/strong&gt;: we 
are discussing a new “duration” or
 “time interval” type and some other additions to the Arrow columnar 
format.&lt;/li&gt;
-&lt;/ul&gt;</content><author><name>wesm</name></author><summary 
type="html">The Apache Arrow team is pleased to announce the 0.13.0 release. 
This covers more than 2 months of development work and includes 550 resolved 
issues from 81 distinct contributors. See the Install Page to learn how to get 
the libraries for your platform. The complete changelog is also available. 
While it’s a large release, this post will give some brief highlights in the 
project since the 0.12.0 release from Janua [...]
-
---&gt;
-
-&lt;p&gt;Python users who upgrade to recently released &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow&lt;/code&gt; 0.12 may find that
-their applications use significantly less memory when converting Arrow string
-data to pandas format. This includes using &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.parquet.read_table&lt;/code&gt; 
and
-&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pandas.read_parquet&lt;/code&gt;. This 
article details some of what is going on under the
-hood, and why Python applications dealing with large amounts of strings are
-prone to memory use problems.&lt;/p&gt;
-
-&lt;h2 id=&quot;why-python-strings-can-use-a-lot-of-memory&quot;&gt;Why Python 
strings can use a lot of memory&lt;/h2&gt;
-
-&lt;p&gt;Let’s start with some possibly surprising facts. I’m going to create 
an empty
-&lt;code class=&quot;highlighter-rouge&quot;&gt;bytes&lt;/code&gt; object and 
an empty &lt;code class=&quot;highlighter-rouge&quot;&gt;str&lt;/code&gt; 
(unicode) object in Python 3.7:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [1]: val = b''
-
-In [2]: unicode_val = u''
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;The &lt;code 
class=&quot;highlighter-rouge&quot;&gt;sys.getsizeof&lt;/code&gt; function 
accurately reports the number of bytes used by
-built-in Python objects. You might be surprised to find that:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [4]: import sys
-In [5]: sys.getsizeof(val)
-Out[5]: 33
-
-In [6]: sys.getsizeof(unicode_val)
-Out[6]: 49
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;Since strings in Python are nul-terminated, we can infer that a bytes 
object
-has 32 bytes of overhead while unicode has 48 bytes. One must also account for
-&lt;code class=&quot;highlighter-rouge&quot;&gt;PyObject*&lt;/code&gt; pointer 
references to the objects, so the actual overhead is 40 and
-56 bytes, respectively. With large strings and text, this overhead may not
-matter much, but when you have a lot of small strings, such as those arising
-from reading a CSV or Apache Parquet file, they can take up an unexpected
-amount of memory. pandas represents strings in NumPy arrays of &lt;code 
class=&quot;highlighter-rouge&quot;&gt;PyObject*&lt;/code&gt;
-pointers, so the total memory used by a unique unicode string is&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;8 (PyObject*) + 48 (Python C struct) 
+ string_length + 1
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;Suppose that we read a CSV file with&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;1 column&lt;/li&gt;
-  &lt;li&gt;1 million rows&lt;/li&gt;
-  &lt;li&gt;Each value in the column is a string with 10 characters&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;On disk this file would take approximately 10MB. Read into memory, 
however, it
-could take up over 60MB, as a 10 character string object takes up 67 bytes in a
-&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pandas.Series&lt;/code&gt;.&lt;/p&gt;
-
-&lt;h2 id=&quot;how-apache-arrow-represents-strings&quot;&gt;How Apache Arrow 
represents strings&lt;/h2&gt;
-
-&lt;p&gt;While a Python unicode string can have 57 bytes of overhead, a string 
in the
-Arrow columnar format has only 4 (32 bits) or 4.125 (33 bits) bytes of
-overhead. 32-bit integer offsets encodes the position and size of a string
-value in a contiguous chunk of memory:&lt;/p&gt;
-
-&lt;div align=&quot;center&quot;&gt;
-&lt;img src=&quot;/img/20190205-arrow-string.png&quot; alt=&quot;Apache Arrow 
string memory layout&quot; width=&quot;80%&quot; 
class=&quot;img-responsive&quot; /&gt;
-&lt;/div&gt;
-
-&lt;p&gt;When you call &lt;code 
class=&quot;highlighter-rouge&quot;&gt;table.to_pandas()&lt;/code&gt; or 
&lt;code class=&quot;highlighter-rouge&quot;&gt;array.to_pandas()&lt;/code&gt; 
with &lt;code class=&quot;highlighter-rouge&quot;&gt;pyarrow&lt;/code&gt;, we
-have to convert this compact string representation back to pandas’s
-Python-based strings. This can use a huge amount of memory when we have a large
-number of small strings. It is a quite common occurrence when working with web
-analytics data, which compresses to a compact size when stored in the Parquet
-columnar file format.&lt;/p&gt;
-
-&lt;p&gt;Note that the Arrow string memory format has other benefits beyond 
memory
-use. It is also much more efficient for analytics due to the guarantee of data
-locality; all strings are next to each other in memory. In the case of pandas
-and Python strings, the string data can be located anywhere in the process
-heap. Arrow PMC member Uwe Korn did some work to &lt;a 
href=&quot;https://www.slideshare.net/xhochy/extending-pandas-using-apache-arrow-and-numba&quot;&gt;extend
 pandas with Arrow
-string arrays&lt;/a&gt; for improved performance and memory use.&lt;/p&gt;
-
-&lt;h2 
id=&quot;reducing-pandas-memory-use-when-converting-from-arrow&quot;&gt;Reducing
 pandas memory use when converting from Arrow&lt;/h2&gt;
-
-&lt;p&gt;For many years, the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pandas.read_csv&lt;/code&gt; function 
has relied on a trick to limit
-the amount of string memory allocated. Because pandas uses arrays of
-&lt;code class=&quot;highlighter-rouge&quot;&gt;PyObject*&lt;/code&gt; 
pointers to refer to objects in the Python heap, we can avoid
-creating multiple strings with the same value, instead reusing existing objects
-and incrementing their reference counts.&lt;/p&gt;
-
-&lt;p&gt;Schematically, we have the following:&lt;/p&gt;
-
-&lt;div align=&quot;center&quot;&gt;
-&lt;img src=&quot;/img/20190205-numpy-string.png&quot; alt=&quot;pandas string 
memory optimization&quot; width=&quot;80%&quot; 
class=&quot;img-responsive&quot; /&gt;
-&lt;/div&gt;
-
-&lt;p&gt;In &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow&lt;/code&gt; 0.12, we have 
implemented this when calling &lt;code 
class=&quot;highlighter-rouge&quot;&gt;to_pandas&lt;/code&gt;. It
-requires using a hash table to deduplicate the Arrow string data as it’s being
-converted to pandas. Hashing data is not free, but counterintuitively it can be
-faster in addition to being vastly more memory efficient in the common case in
-analytics where we have table columns with many instances of the same string
-values.&lt;/p&gt;
-
-&lt;h2 id=&quot;memory-and-performance-benchmarks&quot;&gt;Memory and 
Performance Benchmarks&lt;/h2&gt;
-
-&lt;p&gt;We can use the &lt;a 
href=&quot;https://pypi.org/project/memory-profiler/&quot;&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;memory_profiler&lt;/code&gt;&lt;/a&gt; 
Python package to easily get process
-memory usage within a running Python application.&lt;/p&gt;
-
-&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span 
class=&quot;nn&quot;&gt;memory_profiler&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span 
class=&quot;nf&quot;&gt;mem&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;():&lt;/span&gt;
-    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;memory_profiler&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;memory_usage&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;()[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;In a new application I have:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [7]: mem()
-Out[7]: 86.21875
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;I will generate approximate 1 gigabyte of string data represented as 
Python
-strings with length 10. The &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pandas.util.testing&lt;/code&gt; module 
has a handy &lt;code class=&quot;highlighter-rouge&quot;&gt;rands&lt;/code&gt;
-function for generating random strings. Here is the data generation 
function:&lt;/p&gt;
-
-&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span 
class=&quot;nn&quot;&gt;pandas.util.testing&lt;/span&gt; &lt;span 
class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;rands&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span 
class=&quot;nf&quot;&gt;generate_strings&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;nunique&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;string_length&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1 [...]
-    &lt;span class=&quot;n&quot;&gt;unique_values&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;rands&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;string_length&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span 
class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt; 
[...]
-    &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;unique_values&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;length&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;//&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;nunique&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
-    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;values&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;This generates a certain number of unique strings, then duplicates 
then to
-yield the desired number of total strings. So I’m going to create 100 million
-strings with only 10000 unique values:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [8]: values = 
generate_strings(100000000, 10000)
-
-In [9]: mem()
-Out[9]: 852.140625
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;100 million &lt;code 
class=&quot;highlighter-rouge&quot;&gt;PyObject*&lt;/code&gt; values is only 
745 MB, so this increase of a little
-over 770 MB is consistent with what we know so far. Now I’m going to convert
-this to Arrow format:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [11]: arr = pa.array(values)
-
-In [12]: mem()
-Out[12]: 2276.9609375
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;Since &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow&lt;/code&gt; exactly accounts 
for all of its memory allocations, we also
-check that&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [13]: pa.total_allocated_bytes()
-Out[13]: 1416777280
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;Since each string takes about 14 bytes (10 bytes plus 4 bytes of 
overhead),
-this is what we expect.&lt;/p&gt;
-
-&lt;p&gt;Now, converting &lt;code 
class=&quot;highlighter-rouge&quot;&gt;arr&lt;/code&gt; back to pandas is where 
things get tricky. The &lt;em&gt;minimum&lt;/em&gt;
-amount of memory that pandas can use is a little under 800 MB as above as we
-need 100 million &lt;code 
class=&quot;highlighter-rouge&quot;&gt;PyObject*&lt;/code&gt; values, which are 
8 bytes each.&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [14]: arr_as_pandas = 
arr.to_pandas()
-
-In [15]: mem()
-Out[15]: 3041.78125
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;Doing the math, we used 765 MB which seems right. We can disable the 
string
-deduplication logic by passing &lt;code 
class=&quot;highlighter-rouge&quot;&gt;deduplicate_objects=False&lt;/code&gt; 
to &lt;code 
class=&quot;highlighter-rouge&quot;&gt;to_pandas&lt;/code&gt;:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [16]: arr_as_pandas_no_dedup = 
arr.to_pandas(deduplicate_objects=False)
-
-In [17]: mem()
-Out[17]: 10006.95703125
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;Without object deduplication, we use 6965 megabytes, or an average of 
73 bytes
-per value. This is a little bit higher than the theoretical size of 67 bytes
-computed above.&lt;/p&gt;
-
-&lt;p&gt;One of the more surprising results is that the new behavior is about 
twice as fast:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;In [18]: %time arr_as_pandas_time = 
arr.to_pandas()
-CPU times: user 2.94 s, sys: 213 ms, total: 3.15 s
-Wall time: 3.14 s
-
-In [19]: %time arr_as_pandas_no_dedup_time = 
arr.to_pandas(deduplicate_objects=False)
-CPU times: user 4.19 s, sys: 2.04 s, total: 6.23 s
-Wall time: 6.21 s
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;The reason for this is that creating so many Python objects is more 
expensive
-than hashing the 10 byte values and looking them up in a hash table.&lt;/p&gt;
-
-&lt;p&gt;Note that when you convert Arrow data with mostly unique values back 
to pandas,
-the memory use benefits here won’t have as much of an impact.&lt;/p&gt;
-
-&lt;h2 id=&quot;takeaways&quot;&gt;Takeaways&lt;/h2&gt;
-
-&lt;p&gt;In Apache Arrow, our goal is to develop computational tools to 
operate natively
-on the cache- and SIMD-friendly efficient Arrow columnar format. In the
-meantime, though, we recognize that users have legacy applications using the
-native memory layout of pandas or other analytics tools. We will do our best to
-provide fast and memory-efficient interoperability with pandas and other
-popular 
libraries.&lt;/p&gt;</content><author><name>wesm</name></author><summary 
type="html">Python users who upgrade to recently released pyarrow 0.12 may find 
that their applications use significantly less memory when converting Arrow 
string data to pandas format. This includes using pyarrow.parquet.read_table 
and pandas.read_parquet. This article details some of what is going on under 
the hood, and why Python applications dealing with large amounts of strings are 
prone to memory use p [...]
\ No newline at end of file
+&lt;/ul&gt;</content><author><name>wesm</name></author><summary 
type="html">The Apache Arrow team is pleased to announce the 0.13.0 release. 
This covers more than 2 months of development work and includes 550 resolved 
issues from 81 distinct contributors. See the Install Page to learn how to get 
the libraries for your platform. The complete changelog is also available. 
While it’s a large release, this post will give some brief highlights in the 
project since the 0.12.0 release from Janua [...]
\ No newline at end of file

[arrow-site] branch asf-site updated: Updating built site (build f45dd485ccb491ea2681d369a0258f44cc560ac6)

Reply via email to