[arrow-site] branch asf-site updated: Updating built site (build 8b65ef4cca454979abc8af8d802c7691ee803b32)

github-bot Fri, 23 Oct 2020 08:24:50 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 25fedb5  Updating built site (build 
8b65ef4cca454979abc8af8d802c7691ee803b32)
25fedb5 is described below

commit 25fedb51241ffeb1c52b839cb28a48244c239aad
Author: Neal Richardson <[email protected]>
AuthorDate: Fri Oct 23 15:24:31 2020 +0000

    Updating built site (build 8b65ef4cca454979abc8af8d802c7691ee803b32)
---
 ...manifest-59b8ad9b3f186aacb3e1ce1050f0c6bf.json} |   2 +-
 blog/2020/10/22/2.0.0-release/index.html           | 423 +++++++++++++++++++++
 blog/index.html                                    |  15 +
 feed.xml                                           | 371 ++++++++----------
 4 files changed, 606 insertions(+), 205 deletions(-)

diff --git a/assets/.sprockets-manifest-1cfd625f166867f000a50a1d51a6ddd3.json 
b/assets/.sprockets-manifest-59b8ad9b3f186aacb3e1ce1050f0c6bf.json
similarity index 79%
rename from assets/.sprockets-manifest-1cfd625f166867f000a50a1d51a6ddd3.json
rename to assets/.sprockets-manifest-59b8ad9b3f186aacb3e1ce1050f0c6bf.json
index 4fd2c9c..b2136bb 100644
--- a/assets/.sprockets-manifest-1cfd625f166867f000a50a1d51a6ddd3.json
+++ b/assets/.sprockets-manifest-59b8ad9b3f186aacb3e1ce1050f0c6bf.json
@@ -1 +1 @@
-{"files":{"main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js":{"logical_path":"main.js","mtime":"2020-10-21T10:04:17-04:00","size":124531,"digest":"18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33","integrity":"sha256-GM0wKVV/c8HuguQRExJ7BPb82ExW2dsMucQOvibvbjM="}},"assets":{"main.js":"main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js"}}
\ No newline at end of file
+{"files":{"main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js":{"logical_path":"main.js","mtime":"2020-10-23T11:24:18-04:00","size":124531,"digest":"18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33","integrity":"sha256-GM0wKVV/c8HuguQRExJ7BPb82ExW2dsMucQOvibvbjM="}},"assets":{"main.js":"main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js"}}
\ No newline at end of file
diff --git a/blog/2020/10/22/2.0.0-release/index.html 
b/blog/2020/10/22/2.0.0-release/index.html
new file mode 100644
index 0000000..c4196ed
--- /dev/null
+++ b/blog/2020/10/22/2.0.0-release/index.html
@@ -0,0 +1,423 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <!-- The above meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    
+    <title>Apache Arrow 2.0.0 Release | Apache Arrow</title>
+    
+
+    <!-- Begin Jekyll SEO tag v2.7.1 -->
+<meta name="generator" content="Jekyll v3.8.4" />
+<meta property="og:title" content="Apache Arrow 2.0.0 Release" />
+<meta name="author" content="pmc" />
+<meta property="og:locale" content="en_US" />
+<meta name="description" content="The Apache Arrow team is pleased to announce 
the 2.0.0 release. This covers over 3 months of development work and includes 
511 resolved issues from 81 distinct contributors. See the Install Page to 
learn how to get the libraries for your platform. The release notes below are 
not exhaustive and only expose selected highlights of the release. Many other 
bugfixes and improvements have been made: we refer you to the complete 
changelog. Community Since the 1. [...]
+<meta property="og:description" content="The Apache Arrow team is pleased to 
announce the 2.0.0 release. This covers over 3 months of development work and 
includes 511 resolved issues from 81 distinct contributors. See the Install 
Page to learn how to get the libraries for your platform. The release notes 
below are not exhaustive and only expose selected highlights of the release. 
Many other bugfixes and improvements have been made: we refer you to the 
complete changelog. Community Since [...]
+<link rel="canonical" 
href="https://arrow.apache.org/blog/2020/10/22/2.0.0-release/"; />
+<meta property="og:url" 
content="https://arrow.apache.org/blog/2020/10/22/2.0.0-release/"; />
+<meta property="og:site_name" content="Apache Arrow" />
+<meta property="og:image" content="https://arrow.apache.org/img/arrow.png"; />
+<meta property="og:type" content="article" />
+<meta property="article:published_time" content="2020-10-22T02:00:00-04:00" />
+<meta name="twitter:card" content="summary_large_image" />
+<meta property="twitter:image" 
content="https://arrow.apache.org/img/arrow.png"; />
+<meta property="twitter:title" content="Apache Arrow 2.0.0 Release" />
+<meta name="twitter:site" content="@ApacheArrow" />
+<meta name="twitter:creator" content="@pmc" />
+<script type="application/ld+json">
+{"image":"https://arrow.apache.org/img/arrow.png","@type":"BlogPosting","url":"https://arrow.apache.org/blog/2020/10/22/2.0.0-release/","headline":"Apache
 Arrow 2.0.0 
Release","dateModified":"2020-10-22T02:00:00-04:00","datePublished":"2020-10-22T02:00:00-04:00","author":{"@type":"Person","name":"pmc"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://arrow.apache.org/blog/2020/10/22/2.0.0-release/"},"publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://arro
 [...]
+<!-- End Jekyll SEO tag -->
+
+
+    <!-- favicons -->
+    <link rel="icon" type="image/png" sizes="16x16" 
href="/img/favicon-16x16.png" id="light1">
+    <link rel="icon" type="image/png" sizes="32x32" 
href="/img/favicon-32x32.png" id="light2">
+    <link rel="apple-touch-icon" type="image/png" sizes="180x180" 
href="/img/apple-touch-icon.png" id="light3">
+    <link rel="apple-touch-icon" type="image/png" sizes="120x120" 
href="/img/apple-touch-icon-120x120.png" id="light4">
+    <link rel="apple-touch-icon" type="image/png" sizes="76x76" 
href="/img/apple-touch-icon-76x76.png" id="light5">
+    <link rel="apple-touch-icon" type="image/png" sizes="60x60" 
href="/img/apple-touch-icon-60x60.png" id="light6">
+    <!-- dark mode favicons -->
+    <link rel="icon" type="image/png" sizes="16x16" 
href="/img/favicon-16x16-dark.png" id="dark1">
+    <link rel="icon" type="image/png" sizes="32x32" 
href="/img/favicon-32x32-dark.png" id="dark2">
+    <link rel="apple-touch-icon" type="image/png" sizes="180x180" 
href="/img/apple-touch-icon-dark.png" id="dark3">
+    <link rel="apple-touch-icon" type="image/png" sizes="120x120" 
href="/img/apple-touch-icon-120x120-dark.png" id="dark4">
+    <link rel="apple-touch-icon" type="image/png" sizes="76x76" 
href="/img/apple-touch-icon-76x76-dark.png" id="dark5">
+    <link rel="apple-touch-icon" type="image/png" sizes="60x60" 
href="/img/apple-touch-icon-60x60-dark.png" id="dark6">
+
+    <script>
+      // Switch to the dark-mode favicons if prefers-color-scheme: dark
+      function onUpdate() {
+        light1 = document.querySelector('link#light1');
+        light2 = document.querySelector('link#light2');
+        light3 = document.querySelector('link#light3');
+        light4 = document.querySelector('link#light4');
+        light5 = document.querySelector('link#light5');
+        light6 = document.querySelector('link#light6');
+
+        dark1 = document.querySelector('link#dark1');
+        dark2 = document.querySelector('link#dark2');
+        dark3 = document.querySelector('link#dark3');
+        dark4 = document.querySelector('link#dark4');
+        dark5 = document.querySelector('link#dark5');
+        dark6 = document.querySelector('link#dark6');
+
+        if (matcher.matches) {
+          light1.remove();
+          light2.remove();
+          light3.remove();
+          light4.remove();
+          light5.remove();
+          light6.remove();
+          document.head.append(dark1);
+          document.head.append(dark2);
+          document.head.append(dark3);
+          document.head.append(dark4);
+          document.head.append(dark5);
+          document.head.append(dark6);
+        } else {
+          dark1.remove();
+          dark2.remove();
+          dark3.remove();
+          dark4.remove();
+          dark5.remove();
+          dark6.remove();
+          document.head.append(light1);
+          document.head.append(light2);
+          document.head.append(light3);
+          document.head.append(light4);
+          document.head.append(light5);
+          document.head.append(light6);
+        }
+      }
+      matcher = window.matchMedia('(prefers-color-scheme: dark)');
+      matcher.addListener(onUpdate);
+      onUpdate();
+    </script>
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"; 
integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo"
 crossorigin="anonymous"></script>
+    <script 
src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js"; 
integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49"
 crossorigin="anonymous"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+<body class="wrap">
+  <header>
+    <nav class="navbar navbar-expand-md navbar-dark bg-dark">
+  
+  <a class="navbar-brand no-padding" href="/"><img 
src="/img/arrow-inverse-300px.png" height="40px"/></a>
+  
+   <button class="navbar-toggler ml-auto" type="button" data-toggle="collapse" 
data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" 
aria-label="Toggle navigation">
+    <span class="navbar-toggler-icon"></span>
+  </button>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse justify-content-end" 
id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="nav-item"><a class="nav-link" href="/overview/" 
role="button" aria-haspopup="true" aria-expanded="false">Overview</a></li>
+        <li class="nav-item"><a class="nav-link" href="/faq/" role="button" 
aria-haspopup="true" aria-expanded="false">FAQ</a></li>
+        <li class="nav-item"><a class="nav-link" href="/blog" role="button" 
aria-haspopup="true" aria-expanded="false">Blog</a></li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownGetArrow" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Get Arrow
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownGetArrow">
+            <a class="dropdown-item" href="/install/">Install</a>
+            <a class="dropdown-item" href="/release/">Releases</a>
+            <a class="dropdown-item" 
href="https://github.com/apache/arrow";>Source Code</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownDocumentation" role="button" 
data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Documentation
+          </a>
+          <div class="dropdown-menu" 
aria-labelledby="navbarDropdownDocumentation">
+            <a class="dropdown-item" href="/docs">Project Docs</a>
+            <a class="dropdown-item" 
href="/docs/format/Columnar.html">Format</a>
+            <hr/>
+            <a class="dropdown-item" href="/docs/c_glib">C GLib</a>
+            <a class="dropdown-item" href="/docs/cpp">C++</a>
+            <a class="dropdown-item" 
href="https://github.com/apache/arrow/blob/master/csharp/README.md";>C#</a>
+            <a class="dropdown-item" 
href="https://godoc.org/github.com/apache/arrow/go/arrow";>Go</a>
+            <a class="dropdown-item" href="/docs/java">Java</a>
+            <a class="dropdown-item" href="/docs/js">JavaScript</a>
+            <a class="dropdown-item" 
href="https://github.com/apache/arrow/blob/master/matlab/README.md";>MATLAB</a>
+            <a class="dropdown-item" href="/docs/python">Python</a>
+            <a class="dropdown-item" href="/docs/r">R</a>
+            <a class="dropdown-item" 
href="https://github.com/apache/arrow/blob/master/ruby/README.md";>Ruby</a>
+            <a class="dropdown-item" 
href="https://docs.rs/crate/arrow/";>Rust</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownCommunity" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Community
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
+            <a class="dropdown-item" href="/community/">Communication</a>
+            <a class="dropdown-item" 
href="/docs/developers/contributing.html">Contributing</a>
+            <a class="dropdown-item" 
href="https://issues.apache.org/jira/browse/ARROW";>Issue Tracker</a>
+            <a class="dropdown-item" href="/committers/">Governance</a>
+            <a class="dropdown-item" href="/use_cases/">Use Cases</a>
+            <a class="dropdown-item" href="/powered_by/">Powered By</a>
+            <a class="dropdown-item" href="/security/">Security</a>
+            <a class="dropdown-item" 
href="https://www.apache.org/foundation/policies/conduct.html";>Code of 
Conduct</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownASF" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             ASF Links
+          </a>
+          <div class="dropdown-menu dropdown-menu-right" 
aria-labelledby="navbarDropdownASF">
+            <a class="dropdown-item" href="http://www.apache.org/";>ASF 
Website</a>
+            <a class="dropdown-item" 
href="http://www.apache.org/licenses/";>License</a>
+            <a class="dropdown-item" 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a>
+            <a class="dropdown-item" 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a>
+            <a class="dropdown-item" 
href="http://www.apache.org/security/";>Security</a>
+          </div>
+        </li>
+      </ul>
+    </div><!-- /.navbar-collapse -->
+  </nav>
+
+  </header>
+
+  <div class="container p-4 pt-5">
+    <div class="col-md-8 mx-auto">
+      <main role="main" class="pb-5">
+        
+<h1>
+  Apache Arrow 2.0.0 Release
+</h1>
+<hr class="mt-4 mb-3">
+
+
+
+<p class="mb-4 pb-1">
+  <span class="badge badge-secondary">Published</span>
+  <span class="published mr-3">
+    22 Oct 2020
+  </span>
+  <br />
+  <span class="badge badge-secondary">By</span>
+  
+    <a class="mr-3" href="https://arrow.apache.org";>The Apache Arrow PMC (pmc) 
</a>
+  
+
+  
+</p>
+
+
+        <!--
+
+-->
+
+<p>The Apache Arrow team is pleased to announce the 2.0.0 release. This covers
+over 3 months of development work and includes <a 
href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%202.0.0";><strong>511
 resolved issues</strong></a>
+from <a href="/release/2.0.0.html#contributors"><strong>81 distinct 
contributors</strong></a>. See the Install Page to learn how to
+get the libraries for your platform.</p>
+
+<p>The release notes below are not exhaustive and only expose selected 
highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the <a href="/release/2.0.0.html">complete changelog</a>.</p>
+
+<h2 id="community">Community</h2>
+
+<p>Since the 1.0.0 release, <a href="https://github.com/jorgecarleitao";>Jorge 
Leitão</a> has been added as a committer. Thank
+you for your contributions!</p>
+
+<h2 id="columnar-format">Columnar Format</h2>
+
+<p>As this is the first major release since 1.0.0, we remind everyone that we 
have
+moved to a “split” versioning system where the Library version (which is now
+2.0.0) will now evolve separate from the Format version (which is still
+1.0.0). Major releases of the libraries may contain non-backward-compatible API
+changes, but they will not contain any incompatible format changes. See the
+<a href="http://arrow.apache.org/docs/format/Versioning.html";>Versioning and 
Stability</a> page in the documentation for more.</p>
+
+<p>The columnar format metadata has been updated to permit 256-bit decimal 
values
+in addition to 128-bit decimals. This change is backward and forward
+compatible.</p>
+
+<h2 id="arrow-flight-rpc-notes">Arrow Flight RPC notes</h2>
+
+<p>For Arrow Flight, 2.0.0 mostly brings bugfixes. In Java, some memory leaks 
in
+<code class="highlighter-rouge">FlightStream</code> and <code 
class="highlighter-rouge">DoPut</code> have been addressed. In C++ and Python, 
a deadlock
+has been fixed in an edge case. Additionally, when supported by gRPC, TLS
+verification can be disabled.</p>
+
+<h2 id="c-notes">C++ notes</h2>
+
+<p>Parquet reading now fully supports round trip of arbitrarily nested data,
+including extension types with a nested storage type. In the process, several
+bugs in writing nested data and FixedSizeList were fixed.  If writing data with
+these type we recommend upgrading to this release and validating old data as
+there is potential data loss.</p>
+
+<p>Datasets can now be written with partitions, including these features:</p>
+
+<ul>
+  <li>Writing to Parquet, including control over accumulation of statistics for
+individual columns.</li>
+  <li>Writing to IPC/Feather, including body buffer compression.</li>
+  <li>Basenames of written files can be specified with a string template, 
allowing
+non-colliding writes into the same partitioned dataset.</li>
+</ul>
+
+<p>Other notable features in the release include</p>
+
+<ul>
+  <li>Compute kernels for standard deviation, variance, and mode</li>
+  <li>Improvements to S3 support, including automatic region detection</li>
+  <li>CSV reading now parses Date type and creating Dictionary types</li>
+</ul>
+
+<h2 id="c-notes-1">C# notes</h2>
+
+<p>The .NET package has added a number of new features this release.</p>
+
+<p>Full support for <code class="highlighter-rouge">Struct</code> types.</p>
+
+<p>Synchronous write APIs for <code 
class="highlighter-rouge">ArrowStreamWriter</code> and <code 
class="highlighter-rouge">ArrowFileWriter</code>. These are
+complimentary to the existing async write APIs, and can be used in situations
+where the async APIs can’t be used.</p>
+
+<p>The ability to use <code class="highlighter-rouge">DateTime</code> 
instances with <code class="highlighter-rouge">Date32Array</code> and <code 
class="highlighter-rouge">Date64Array</code>.</p>
+
+<h2 id="java-notes">Java notes</h2>
+
+<p>The Java package has supported a number of new features.  Users can validate
+vectors in a wider range of aspects, if they are willing to take more time.  In
+dictionary encoding, dictionary indices can be expressed as unsigned integers.
+A framework for data compression has been setup for IPC.</p>
+
+<p>The calculation for vector capacity has been simplified, so users should
+experience notable performance improvements for various ‘setSafe’ methods.</p>
+
+<p>Bugs for JDBC adapters, sort algorithms, and ComplexCopier have been 
resolved
+to make them more usable.</p>
+
+<h2 id="javascript-notes">JavaScript notes</h2>
+
+<p>Upgrades Arrow’s build to use TypeScript 3.9, fixing generated <code 
class="highlighter-rouge">.d.ts</code> typings.</p>
+
+<h2 id="python-notes">Python notes</h2>
+
+<p>Parquet reading now supports round trip of arbitrarily nested data. Several 
bug
+fixes for writing nested data and FixedSizeList.  If writing data with these
+type we recommend validating old data (there is potential some data loss) and
+upgrade to 2.0.</p>
+
+<p>Extension types with a nested storage type now round trip through 
Parquet.</p>
+
+<p>The <code class="highlighter-rouge">pyarrow.filesystem</code> submodule is 
deprecated in favor of new filesystem
+implementations in <code class="highlighter-rouge">pyarrow.fs</code>.</p>
+
+<p>The custom serialization functionality (<code 
class="highlighter-rouge">pyarrow.serialize()</code>,
+<code class="highlighter-rouge">pyarrow.deserialize()</code>, etc) is 
deprecated. Those functions provided a
+Python-specific (not cross-language) serialization format which were not
+compatible with the standardized Arrow (IPC) serialization format.  For
+arbitrary objects, you can use the standard library <code 
class="highlighter-rouge">pickle</code> functionality
+instead. For pyarrow objects, you can use the IPC serialization format through
+the <code class="highlighter-rouge">pyarrow.ipc</code> module, as explained 
above.</p>
+
+<p>The <code class="highlighter-rouge">pyarrow.compute</code> module now has a 
complete coverage of the available C++
+compute kernels in the python API. Several new kernels have been added.</p>
+
+<p>The <code class="highlighter-rouge">pyarrow.dataset</code> module was 
further improved. In addition to reading, it
+is now also possible to write partitioned datasets (with <code 
class="highlighter-rouge">write_dataset()</code>).</p>
+
+<p>The Arrow &lt;-&gt; Python conversion code was refactored, fixing several 
bugs and
+corner cases.</p>
+
+<p>Conversion of an array of <code 
class="highlighter-rouge">pyarrow.MapType</code> to Pandas has been added.</p>
+
+<p>Conversion of timezone aware datetimes to and/from pyarrow arrays including
+pandas now round-trip preserving timezone. To use the old behavior (e.g. for
+spark) set the environment variable PYARROW_IGNORE_TIMEZONE to a truthy value
+(i.e.  <code class="highlighter-rouge">PYARROW_IGNORE_TIMEZONE=1</code>)</p>
+
+<h2 id="r-notes">R notes</h2>
+
+<p>Highlights of the R release include</p>
+
+<ul>
+  <li>Writing multi-file datasets with partitioning to Parquet or Feather</li>
+  <li>Reading and writing directly to AWS S3, both individual files and 
multi-file
+datasets</li>
+  <li>Bindings for Flight which use reticulate</li>
+</ul>
+
+<p>In addition, the R package benefits from the various improvements in the C++
+library listed above, including the ability to read and write Parquet files
+with nested struct and list types.</p>
+
+<p>For more on what’s in the 2.0.0 R package, see the <a 
href="/docs/r/news/">R changelog</a>.</p>
+
+<h2 id="ruby-and-c-glib-notes">Ruby and C GLib notes</h2>
+
+<h3 id="ruby">Ruby</h3>
+
+<p>In Ruby binding, <code class="highlighter-rouge">Arrow::Table#save</code> 
uses the number of rows as the
+<code class="highlighter-rouge">chunk_size</code> parameter by default when 
the table is saved in a Parquet file.</p>
+
+<h3 id="c-glib">C GLib</h3>
+
+<p>The GLib binding newly supports <code 
class="highlighter-rouge">GArrowStringDictionaryArrayBuilder</code> and
+<code class="highlighter-rouge">GArrowBinaryDictionaryArrayBuilder</code>.</p>
+
+<p>Moreover the GLib binding supports new accessors of <code 
class="highlighter-rouge">GArrowListArray</code> and
+<code class="highlighter-rouge">GArrowLargeListArray</code>.  They are <code 
class="highlighter-rouge">get_values</code>, <code 
class="highlighter-rouge">get_value_offset</code>,
+<code class="highlighter-rouge">get_value_length</code>, and <code 
class="highlighter-rouge">get_value_offsets</code>.</p>
+
+<h2 id="rust-notes">Rust notes</h2>
+
+<p>Due to the high volume of activity in the Rust subproject in this release,
+we’re writing a separate blog post dedicated to those changes.</p>
+
+
+      </main>
+    </div>
+
+    <hr/>
+<footer class="footer">
+  <div class="row">
+  <div class="col-md-9">
+    <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+    <p>&copy; 2016-2020 The Apache Software Foundation</p>
+  </div>
+  <div class="col-md-3">
+    <a class="d-sm-none d-md-inline pr-2" 
href="https://www.apache.org/events/current-event.html";>
+      <img src="https://www.apache.org/events/current-event-234x60.png"/>
+    </a>
+  </div>
+  <script type="text/javascript" 
src="/assets/main-18cd3029557f73c1ee82e41113127b04f6fcd84c56d9db0cb9c40ebe26ef6e33.js"
 integrity="sha256-GM0wKVV/c8HuguQRExJ7BPb82ExW2dsMucQOvibvbjM=" 
crossorigin="anonymous"></script>
+</footer>
+
+  </div>
+</body>
+</html>
diff --git a/blog/index.html b/blog/index.html
index b43dd33..eb7d72a 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -215,6 +215,21 @@
   
   <p>
     <h3>
+      <a href="/blog/2020/10/22/2.0.0-release/">Apache Arrow 2.0.0 Release</a>
+    </h3>
+    
+    <p>
+    <span class="blog-list-date">
+      22 October 2020
+    </span>
+    </p>
+    The Apache Arrow team is pleased to announce the 2.0.0 release. This 
covers over 3 months of development work and includes 511 resolved issues from 
81 distinct contributors. See the Install Page to learn how to get the 
libraries for your platform. The release notes below are not exhaustive and...
+  </p>
+  
+
+  
+  <p>
+    <h3>
       <a href="/blog/2020/07/29/cpp-build-simplification/">Making Arrow C++ 
Builds Simpler, Smaller, and Faster</a>
     </h3>
     
diff --git a/feed.xml b/feed.xml
index 7e1249b..06f0caf 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,169 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.8.4">Jekyll</generator><link 
href="https://arrow.apache.org/feed.xml"; rel="self" type="application/atom+xml" 
/><link href="https://arrow.apache.org/"; rel="alternate" type="text/html" 
/><updated>2020-10-21T10:04:08-04:00</updated><id>https://arrow.apache.org/feed.xml</id><title
 type="html">Apache Arrow</title><subtitle>Apache Arrow is a cross-language 
developm [...]
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.8.4">Jekyll</generator><link 
href="https://arrow.apache.org/feed.xml"; rel="self" type="application/atom+xml" 
/><link href="https://arrow.apache.org/"; rel="alternate" type="text/html" 
/><updated>2020-10-23T11:24:09-04:00</updated><id>https://arrow.apache.org/feed.xml</id><title
 type="html">Apache Arrow</title><subtitle>Apache Arrow is a cross-language 
developm [...]
+
+--&gt;
+
+&lt;p&gt;The Apache Arrow team is pleased to announce the 2.0.0 release. This 
covers
+over 3 months of development work and includes &lt;a 
href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%202.0.0&quot;&gt;&lt;strong&gt;511
 resolved issues&lt;/strong&gt;&lt;/a&gt;
+from &lt;a 
href=&quot;/release/2.0.0.html#contributors&quot;&gt;&lt;strong&gt;81 distinct 
contributors&lt;/strong&gt;&lt;/a&gt;. See the Install Page to learn how to
+get the libraries for your platform.&lt;/p&gt;
+
+&lt;p&gt;The release notes below are not exhaustive and only expose selected 
highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the &lt;a href=&quot;/release/2.0.0.html&quot;&gt;complete 
changelog&lt;/a&gt;.&lt;/p&gt;
+
+&lt;h2 id=&quot;community&quot;&gt;Community&lt;/h2&gt;
+
+&lt;p&gt;Since the 1.0.0 release, &lt;a 
href=&quot;https://github.com/jorgecarleitao&quot;&gt;Jorge Leitão&lt;/a&gt; 
has been added as a committer. Thank
+you for your contributions!&lt;/p&gt;
+
+&lt;h2 id=&quot;columnar-format&quot;&gt;Columnar Format&lt;/h2&gt;
+
+&lt;p&gt;As this is the first major release since 1.0.0, we remind everyone 
that we have
+moved to a “split” versioning system where the Library version (which is now
+2.0.0) will now evolve separate from the Format version (which is still
+1.0.0). Major releases of the libraries may contain non-backward-compatible API
+changes, but they will not contain any incompatible format changes. See the
+&lt;a 
href=&quot;http://arrow.apache.org/docs/format/Versioning.html&quot;&gt;Versioning
 and Stability&lt;/a&gt; page in the documentation for more.&lt;/p&gt;
+
+&lt;p&gt;The columnar format metadata has been updated to permit 256-bit 
decimal values
+in addition to 128-bit decimals. This change is backward and forward
+compatible.&lt;/p&gt;
+
+&lt;h2 id=&quot;arrow-flight-rpc-notes&quot;&gt;Arrow Flight RPC 
notes&lt;/h2&gt;
+
+&lt;p&gt;For Arrow Flight, 2.0.0 mostly brings bugfixes. In Java, some memory 
leaks in
+&lt;code class=&quot;highlighter-rouge&quot;&gt;FlightStream&lt;/code&gt; and 
&lt;code class=&quot;highlighter-rouge&quot;&gt;DoPut&lt;/code&gt; have been 
addressed. In C++ and Python, a deadlock
+has been fixed in an edge case. Additionally, when supported by gRPC, TLS
+verification can be disabled.&lt;/p&gt;
+
+&lt;h2 id=&quot;c-notes&quot;&gt;C++ notes&lt;/h2&gt;
+
+&lt;p&gt;Parquet reading now fully supports round trip of arbitrarily nested 
data,
+including extension types with a nested storage type. In the process, several
+bugs in writing nested data and FixedSizeList were fixed.  If writing data with
+these type we recommend upgrading to this release and validating old data as
+there is potential data loss.&lt;/p&gt;
+
+&lt;p&gt;Datasets can now be written with partitions, including these 
features:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Writing to Parquet, including control over accumulation of 
statistics for
+individual columns.&lt;/li&gt;
+  &lt;li&gt;Writing to IPC/Feather, including body buffer 
compression.&lt;/li&gt;
+  &lt;li&gt;Basenames of written files can be specified with a string 
template, allowing
+non-colliding writes into the same partitioned dataset.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Other notable features in the release include&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Compute kernels for standard deviation, variance, and 
mode&lt;/li&gt;
+  &lt;li&gt;Improvements to S3 support, including automatic region 
detection&lt;/li&gt;
+  &lt;li&gt;CSV reading now parses Date type and creating Dictionary 
types&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;c-notes-1&quot;&gt;C# notes&lt;/h2&gt;
+
+&lt;p&gt;The .NET package has added a number of new features this 
release.&lt;/p&gt;
+
+&lt;p&gt;Full support for &lt;code 
class=&quot;highlighter-rouge&quot;&gt;Struct&lt;/code&gt; types.&lt;/p&gt;
+
+&lt;p&gt;Synchronous write APIs for &lt;code 
class=&quot;highlighter-rouge&quot;&gt;ArrowStreamWriter&lt;/code&gt; and 
&lt;code class=&quot;highlighter-rouge&quot;&gt;ArrowFileWriter&lt;/code&gt;. 
These are
+complimentary to the existing async write APIs, and can be used in situations
+where the async APIs can’t be used.&lt;/p&gt;
+
+&lt;p&gt;The ability to use &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DateTime&lt;/code&gt; instances with 
&lt;code class=&quot;highlighter-rouge&quot;&gt;Date32Array&lt;/code&gt; and 
&lt;code 
class=&quot;highlighter-rouge&quot;&gt;Date64Array&lt;/code&gt;.&lt;/p&gt;
+
+&lt;h2 id=&quot;java-notes&quot;&gt;Java notes&lt;/h2&gt;
+
+&lt;p&gt;The Java package has supported a number of new features.  Users can 
validate
+vectors in a wider range of aspects, if they are willing to take more time.  In
+dictionary encoding, dictionary indices can be expressed as unsigned integers.
+A framework for data compression has been setup for IPC.&lt;/p&gt;
+
+&lt;p&gt;The calculation for vector capacity has been simplified, so users 
should
+experience notable performance improvements for various ‘setSafe’ 
methods.&lt;/p&gt;
+
+&lt;p&gt;Bugs for JDBC adapters, sort algorithms, and ComplexCopier have been 
resolved
+to make them more usable.&lt;/p&gt;
+
+&lt;h2 id=&quot;javascript-notes&quot;&gt;JavaScript notes&lt;/h2&gt;
+
+&lt;p&gt;Upgrades Arrow’s build to use TypeScript 3.9, fixing generated 
&lt;code class=&quot;highlighter-rouge&quot;&gt;.d.ts&lt;/code&gt; 
typings.&lt;/p&gt;
+
+&lt;h2 id=&quot;python-notes&quot;&gt;Python notes&lt;/h2&gt;
+
+&lt;p&gt;Parquet reading now supports round trip of arbitrarily nested data. 
Several bug
+fixes for writing nested data and FixedSizeList.  If writing data with these
+type we recommend validating old data (there is potential some data loss) and
+upgrade to 2.0.&lt;/p&gt;
+
+&lt;p&gt;Extension types with a nested storage type now round trip through 
Parquet.&lt;/p&gt;
+
+&lt;p&gt;The &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.filesystem&lt;/code&gt; 
submodule is deprecated in favor of new filesystem
+implementations in &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.fs&lt;/code&gt;.&lt;/p&gt;
+
+&lt;p&gt;The custom serialization functionality (&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.serialize()&lt;/code&gt;,
+&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.deserialize()&lt;/code&gt;, etc) 
is deprecated. Those functions provided a
+Python-specific (not cross-language) serialization format which were not
+compatible with the standardized Arrow (IPC) serialization format.  For
+arbitrary objects, you can use the standard library &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pickle&lt;/code&gt; functionality
+instead. For pyarrow objects, you can use the IPC serialization format through
+the &lt;code class=&quot;highlighter-rouge&quot;&gt;pyarrow.ipc&lt;/code&gt; 
module, as explained above.&lt;/p&gt;
+
+&lt;p&gt;The &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.compute&lt;/code&gt; module now 
has a complete coverage of the available C++
+compute kernels in the python API. Several new kernels have been 
added.&lt;/p&gt;
+
+&lt;p&gt;The &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.dataset&lt;/code&gt; module was 
further improved. In addition to reading, it
+is now also possible to write partitioned datasets (with &lt;code 
class=&quot;highlighter-rouge&quot;&gt;write_dataset()&lt;/code&gt;).&lt;/p&gt;
+
+&lt;p&gt;The Arrow &amp;lt;-&amp;gt; Python conversion code was refactored, 
fixing several bugs and
+corner cases.&lt;/p&gt;
+
+&lt;p&gt;Conversion of an array of &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.MapType&lt;/code&gt; to Pandas 
has been added.&lt;/p&gt;
+
+&lt;p&gt;Conversion of timezone aware datetimes to and/from pyarrow arrays 
including
+pandas now round-trip preserving timezone. To use the old behavior (e.g. for
+spark) set the environment variable PYARROW_IGNORE_TIMEZONE to a truthy value
+(i.e.  &lt;code 
class=&quot;highlighter-rouge&quot;&gt;PYARROW_IGNORE_TIMEZONE=1&lt;/code&gt;)&lt;/p&gt;
+
+&lt;h2 id=&quot;r-notes&quot;&gt;R notes&lt;/h2&gt;
+
+&lt;p&gt;Highlights of the R release include&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Writing multi-file datasets with partitioning to Parquet or 
Feather&lt;/li&gt;
+  &lt;li&gt;Reading and writing directly to AWS S3, both individual files and 
multi-file
+datasets&lt;/li&gt;
+  &lt;li&gt;Bindings for Flight which use reticulate&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;In addition, the R package benefits from the various improvements in 
the C++
+library listed above, including the ability to read and write Parquet files
+with nested struct and list types.&lt;/p&gt;
+
+&lt;p&gt;For more on what’s in the 2.0.0 R package, see the &lt;a 
href=&quot;/docs/r/news/&quot;&gt;R changelog&lt;/a&gt;.&lt;/p&gt;
+
+&lt;h2 id=&quot;ruby-and-c-glib-notes&quot;&gt;Ruby and C GLib notes&lt;/h2&gt;
+
+&lt;h3 id=&quot;ruby&quot;&gt;Ruby&lt;/h3&gt;
+
+&lt;p&gt;In Ruby binding, &lt;code 
class=&quot;highlighter-rouge&quot;&gt;Arrow::Table#save&lt;/code&gt; uses the 
number of rows as the
+&lt;code class=&quot;highlighter-rouge&quot;&gt;chunk_size&lt;/code&gt; 
parameter by default when the table is saved in a Parquet file.&lt;/p&gt;
+
+&lt;h3 id=&quot;c-glib&quot;&gt;C GLib&lt;/h3&gt;
+
+&lt;p&gt;The GLib binding newly supports &lt;code 
class=&quot;highlighter-rouge&quot;&gt;GArrowStringDictionaryArrayBuilder&lt;/code&gt;
 and
+&lt;code 
class=&quot;highlighter-rouge&quot;&gt;GArrowBinaryDictionaryArrayBuilder&lt;/code&gt;.&lt;/p&gt;
+
+&lt;p&gt;Moreover the GLib binding supports new accessors of &lt;code 
class=&quot;highlighter-rouge&quot;&gt;GArrowListArray&lt;/code&gt; and
+&lt;code 
class=&quot;highlighter-rouge&quot;&gt;GArrowLargeListArray&lt;/code&gt;.  They 
are &lt;code class=&quot;highlighter-rouge&quot;&gt;get_values&lt;/code&gt;, 
&lt;code class=&quot;highlighter-rouge&quot;&gt;get_value_offset&lt;/code&gt;,
+&lt;code class=&quot;highlighter-rouge&quot;&gt;get_value_length&lt;/code&gt;, 
and &lt;code 
class=&quot;highlighter-rouge&quot;&gt;get_value_offsets&lt;/code&gt;.&lt;/p&gt;
+
+&lt;h2 id=&quot;rust-notes&quot;&gt;Rust notes&lt;/h2&gt;
+
+&lt;p&gt;Due to the high volume of activity in the Rust subproject in this 
release,
+we’re writing a separate blog post dedicated to those 
changes.&lt;/p&gt;</content><author><name>pmc</name></author><category 
term="release" /><summary type="html">The Apache Arrow team is pleased to 
announce the 2.0.0 release. This covers over 3 months of development work and 
includes 511 resolved issues from 81 distinct contributors. See the Install 
Page to learn how to get the libraries for your platform. The release notes 
below are not exhaustive and only expose selected highlights of [...]
 
 --&gt;
 
@@ -1917,206 +2082,4 @@ batches in R.&lt;/p&gt;
 
 &lt;p&gt;There are a number of active discussions ongoing on the developer
 [email protected] mailing list. We look forward to hearing from the
-community there.&lt;/p&gt;</content><author><name>pmc</name></author><category 
term="release" /><summary type="html">The Apache Arrow team is pleased to 
announce the 0.15.0 release. This covers about 3 months of development work and 
includes 687 resolved issues from 80 distinct contributors. See the Install 
Page to learn how to get the libraries for your platform. The complete 
changelog is also available. About a third of issues closed (240) were 
classified as bug fixes, so this release  [...]
-
---&gt;
-
-&lt;p&gt;We have been implementing a series of optimizations in the Apache 
Parquet C++
-internals to improve read and write efficiency (both performance and memory
-use) for Arrow columnar binary and string data, with new “native” support for
-Arrow’s dictionary types. This should have a big impact on users of the C++,
-MATLAB, Python, R, and Ruby interfaces to Parquet files.&lt;/p&gt;
-
-&lt;p&gt;This post reviews work that was done and shows benchmarks comparing 
Arrow
-0.12.1 with the current development version (to be released soon as Arrow
-0.15.0).&lt;/p&gt;
-
-&lt;h1 id=&quot;summary-of-work&quot;&gt;Summary of work&lt;/h1&gt;
-
-&lt;p&gt;One of the largest and most complex optimizations involves encoding 
and
-decoding Parquet files’ internal dictionary-encoded data streams to and from
-Arrow’s in-memory dictionary-encoded &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DictionaryArray&lt;/code&gt;
-representation. Dictionary encoding is a compression strategy in Parquet, and
-there is no formal “dictionary” or “categorical” type. I will go into more
-detail about this below.&lt;/p&gt;
-
-&lt;p&gt;Some of the particular JIRA issues related to this work 
include:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Vectorize comparators for computing statistics (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/PARQUET-1523&quot;&gt;PARQUET-1523&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Read binary directly data directly into dictionary builder
-(&lt;a 
href=&quot;https://issues.apache.org/jira/browse/ARROW-3769&quot;&gt;ARROW-3769&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Writing Parquet’s dictionary indices directly into dictionary 
builder
-(&lt;a 
href=&quot;https://issues.apache.org/jira/browse/ARROW-3772&quot;&gt;ARROW-3772&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Write dense (non-dictionary) Arrow arrays directly into Parquet 
data encoders
-(&lt;a 
href=&quot;https://issues.apache.org/jira/browse/ARROW-6152&quot;&gt;ARROW-6152&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Direct writing of &lt;code 
class=&quot;highlighter-rouge&quot;&gt;arrow::DictionaryArray&lt;/code&gt; to 
Parquet column writers (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/ARROW-3246&quot;&gt;ARROW-3246&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Supporting changing dictionaries (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/ARROW-3144&quot;&gt;ARROW-3144&lt;/a&gt;)&lt;/li&gt;
-  &lt;li&gt;Internal IO optimizations and improved raw &lt;code 
class=&quot;highlighter-rouge&quot;&gt;BYTE_ARRAY&lt;/code&gt; encoding 
performance
-(&lt;a 
href=&quot;https://issues.apache.org/jira/browse/ARROW-4398&quot;&gt;ARROW-4398&lt;/a&gt;)&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;One of the challenges of developing the Parquet C++ library is that 
we maintain
-low-level read and write APIs that do not involve the Arrow columnar data
-structures. So we have had to take care to implement Arrow-related
-optimizations without impacting non-Arrow Parquet users, which includes
-database systems like Clickhouse and Vertica.&lt;/p&gt;
-
-&lt;h1 
id=&quot;background-how-parquet-files-do-dictionary-encoding&quot;&gt;Background:
 how Parquet files do dictionary encoding&lt;/h1&gt;
-
-&lt;p&gt;Many direct and indirect users of Apache Arrow use dictionary 
encoding to
-improve performance and memory use on binary or string data types that include
-many repeated values. MATLAB or pandas users will know this as the Categorical
-type (see &lt;a 
href=&quot;https://www.mathworks.com/help/matlab/categorical-arrays.html&quot;&gt;MATLAB
 docs&lt;/a&gt; or &lt;a 
href=&quot;https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html&quot;&gt;pandas
 docs&lt;/a&gt;) while in R such encoding is
-known as &lt;a 
href=&quot;https://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html&quot;&gt;&lt;code
 class=&quot;highlighter-rouge&quot;&gt;factor&lt;/code&gt;&lt;/a&gt;. In the 
Arrow C++ library and various bindings we have
-the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DictionaryArray&lt;/code&gt; object for 
representing such data in memory.&lt;/p&gt;
-
-&lt;p&gt;For example, an array such as&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;['apple', 'orange', 'apple', NULL, 
'orange', 'orange']
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;has dictionary-encoded form&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;dictionary: ['apple', 'orange']
-indices: [0, 1, 0, NULL, 1, 1]
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;The &lt;a 
href=&quot;https://github.com/apache/parquet-format/blob/master/Encodings.md&quot;&gt;Parquet
 format uses dictionary encoding&lt;/a&gt; to compress data, and it is
-used for all Parquet data types, not just binary or string data. Parquet
-further uses bit-packing and run-length encoding (RLE) to compress the
-dictionary indices, so if you had data like&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;['apple', 'apple', 'apple', 'apple', 
'apple', 'apple', 'orange']
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;the indices would be encoded like&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div 
class=&quot;highlight&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;[rle-run=(6, 0),
- bit-packed-run=[1]]
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;The full details of the rle-bitpacking encoding are found in the 
&lt;a 
href=&quot;https://github.com/apache/parquet-format/blob/master/Encodings.md&quot;&gt;Parquet
-specification&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;When writing a Parquet file, most implementations will use dictionary 
encoding
-to compress a column until the dictionary itself reaches a certain size
-threshold, usually around 1 megabyte. At this point, the column writer will
-“fall back” to &lt;code 
class=&quot;highlighter-rouge&quot;&gt;PLAIN&lt;/code&gt; encoding where values 
are written end-to-end in “data
-pages” and then usually compressed with Snappy or Gzip. See the following rough
-diagram:&lt;/p&gt;
-
-&lt;div align=&quot;center&quot;&gt;
-&lt;img src=&quot;/img/20190903-parquet-dictionary-column-chunk.png&quot; 
alt=&quot;Internal ColumnChunk structure&quot; width=&quot;80%&quot; 
class=&quot;img-responsive&quot; /&gt;
-&lt;/div&gt;
-
-&lt;h1 
id=&quot;faster-reading-and-writing-of-dictionary-encoded-data&quot;&gt;Faster 
reading and writing of dictionary-encoded data&lt;/h1&gt;
-
-&lt;p&gt;When reading a Parquet file, the dictionary-encoded portions are 
usually
-materialized to their non-dictionary-encoded form, causing binary or string
-values to be duplicated in memory. So an obvious (but not trivial) optimization
-is to skip this “dense” materialization. There are several issues to deal 
with:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;A Parquet file often contains multiple ColumnChunks for each 
semantic column,
-and the dictionary values may be different in each ColumnChunk&lt;/li&gt;
-  &lt;li&gt;We must gracefully handle the “fall back” portion which is not
-dictionary-encoded&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;We pursued several avenues to help with this:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Allowing each &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DictionaryArray&lt;/code&gt; to have a 
different dictionary (before, the
-dictionary was part of the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DictionaryType&lt;/code&gt;, which 
caused problems)&lt;/li&gt;
-  &lt;li&gt;We enabled the Parquet dictionary indices to be directly written 
into an
-Arrow &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DictionaryBuilder&lt;/code&gt; without 
rehashing the data&lt;/li&gt;
-  &lt;li&gt;When decoding a ColumnChunk, we first append the dictionary values 
and
-indices into an Arrow &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DictionaryBuilder&lt;/code&gt;, and when 
we encounter the “fall
-back” portion we use a hash table to convert those values to
-dictionary-encoded form&lt;/li&gt;
-  &lt;li&gt;We override the “fall back” logic when writing a ColumnChunk from 
an
-&lt;code class=&quot;highlighter-rouge&quot;&gt;DictionaryArray&lt;/code&gt; 
so that reading such data back is more efficient&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;All of these things together have produced some excellent performance 
results
-that we will detail below.&lt;/p&gt;
-
-&lt;p&gt;The other class of optimizations we implemented was removing an 
abstraction
-layer between the low-level Parquet column data encoder and decoder classes and
-the Arrow columnar data structures. This involves:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Adding &lt;code 
class=&quot;highlighter-rouge&quot;&gt;ColumnWriter::WriteArrow&lt;/code&gt; 
and &lt;code class=&quot;highlighter-rouge&quot;&gt;Encoder::Put&lt;/code&gt; 
methods that accept
-&lt;code class=&quot;highlighter-rouge&quot;&gt;arrow::Array&lt;/code&gt; 
objects directly&lt;/li&gt;
-  &lt;li&gt;Adding &lt;code 
class=&quot;highlighter-rouge&quot;&gt;ByteArrayDecoder::DecodeArrow&lt;/code&gt;
 method to decode binary data directly
-into an &lt;code 
class=&quot;highlighter-rouge&quot;&gt;arrow::BinaryBuilder&lt;/code&gt;.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;While the performance improvements from this work are less dramatic 
than for
-dictionary-encoded data, they are still meaningful in real-world 
applications.&lt;/p&gt;
-
-&lt;h1 id=&quot;performance-benchmarks&quot;&gt;Performance 
Benchmarks&lt;/h1&gt;
-
-&lt;p&gt;We ran some benchmarks comparing Arrow 0.12.1 with the current master
-branch. We construct two kinds of Arrow tables with 10 columns each:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;“Low cardinality” and “high cardinality” variants. The “low 
cardinality” case
-has 1,000 unique string values of 32-bytes each. The “high cardinality” has
-100,000 unique values&lt;/li&gt;
-  &lt;li&gt;“Dense” (non-dictionary) and “Dictionary” variants&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;&lt;a 
href=&quot;https://gist.github.com/wesm/b4554e2d6028243a30eeed2c644a9066&quot;&gt;See
 the full benchmark script.&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;We show both single-threaded and multithreaded read performance. The 
test
-machine is an Intel i9-9960X using gcc 8.3.0 (on Ubuntu 18.04) with 16 physical
-cores and 32 virtual cores. All time measurements are reported in seconds, but
-we are most interested in showing the relative performance.&lt;/p&gt;
-
-&lt;p&gt;First, the writing benchmarks:&lt;/p&gt;
-
-&lt;div align=&quot;center&quot;&gt;
-&lt;img src=&quot;/img/20190903_parquet_write_perf.png&quot; alt=&quot;Parquet 
write benchmarks&quot; width=&quot;80%&quot; class=&quot;img-responsive&quot; 
/&gt;
-&lt;/div&gt;
-
-&lt;p&gt;Writing &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DictionaryArray&lt;/code&gt; is 
dramatically faster due to the optimizations
-described above. We have achieved a small improvement in writing dense
-(non-dictionary) binary arrays.&lt;/p&gt;
-
-&lt;p&gt;Then, the reading benchmarks:&lt;/p&gt;
-
-&lt;div align=&quot;center&quot;&gt;
-&lt;img src=&quot;/img/20190903_parquet_read_perf.png&quot; alt=&quot;Parquet 
read benchmarks&quot; width=&quot;80%&quot; class=&quot;img-responsive&quot; 
/&gt;
-&lt;/div&gt;
-
-&lt;p&gt;Here, similarly reading &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DictionaryArray&lt;/code&gt; directly is 
many times faster.&lt;/p&gt;
-
-&lt;p&gt;These benchmarks show that parallel reads of dense binary data may be 
slightly
-slower though single-threaded reads are now faster. We may want to do some
-profiling and see what we can do to bring read performance back in
-line. Optimizing the dense read path has not been too much of a priority
-relative to the dictionary read path in this work.&lt;/p&gt;
-
-&lt;h1 id=&quot;memory-use-improvements&quot;&gt;Memory Use 
Improvements&lt;/h1&gt;
-
-&lt;p&gt;In addition to faster performance, reading columns as 
dictionary-encoded can
-yield significantly less memory use.&lt;/p&gt;
-
-&lt;p&gt;In the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;dict-random&lt;/code&gt; case above, we 
found that the master branch uses 405 MB of
-RAM at peak while loading a 152 MB dataset. In v0.12.1, loading the same
-Parquet file without the accelerated dictionary support uses 1.94 GB of peak
-memory while the resulting non-dictionary table occupies 1.01 GB.&lt;/p&gt;
-
-&lt;p&gt;Note that we had a memory overuse bug in versions 0.14.0 and 0.14.1 
fixed in
-ARROW-6060, so if you are hitting this bug you will want to upgrade to 0.15.0
-as soon as it comes out.&lt;/p&gt;
-
-&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;
-
-&lt;p&gt;There are still many Parquet-related optimizations that we may pursue 
in the
-future, but the ones here can be very helpful to people working with
-string-heavy datasets, both in performance and memory use. If you’d like to
-discuss this development work, we’d be glad to hear from you on our developer
-mailing list 
[email protected].&lt;/p&gt;</content><author><name>wesm</name></author><category
 term="application" /><summary type="html">We have been implementing a series 
of optimizations in the Apache Parquet C++ internals to improve read and write 
efficiency (both performance and memory use) for Arrow columnar binary and 
string data, with new “native” support for Arrow’s dictionary types. This 
should have a big impact on users of the C++, MATLAB, Python, R, and Ruby 
interfaces to P [...]
\ No newline at end of file
+community there.&lt;/p&gt;</content><author><name>pmc</name></author><category 
term="release" /><summary type="html">The Apache Arrow team is pleased to 
announce the 0.15.0 release. This covers about 3 months of development work and 
includes 687 resolved issues from 80 distinct contributors. See the Install 
Page to learn how to get the libraries for your platform. The complete 
changelog is also available. About a third of issues closed (240) were 
classified as bug fixes, so this release  [...]
\ No newline at end of file

[arrow-site] branch asf-site updated: Updating built site (build 8b65ef4cca454979abc8af8d802c7691ee803b32)

Reply via email to