[1/3] arrow-site git commit: Add 0.8.0 blog posts

wesm Tue, 19 Dec 2017 07:32:12 -0800

Repository: arrow-site
Updated Branches:
  refs/heads/asf-site 6a8b4465c -> 0a7dc4187



http://git-wip-us.apache.org/repos/asf/arrow-site/blob/0a7dc418/docs/ipc.html
----------------------------------------------------------------------
diff --git a/docs/ipc.html b/docs/ipc.html
index 6d96632..c480fea 100644
--- a/docs/ipc.html
+++ b/docs/ipc.html
@@ -145,7 +145,7 @@
 
 <ul>
   <li>A length prefix indicating the metadata size</li>
-  <li>The message metadata as a <a 
href="https://github.com/google/flatbuffers";>Flatbuffer</a></li>
+  <li>The message metadata as a <a 
href="https://github.com/google]/flatbuffers";>Flatbuffer</a></li>
   <li>Padding bytes to an 8-byte boundary</li>
   <li>The message body, which must be a multiple of 8 bytes</li>
 </ul>
@@ -190,9 +190,7 @@ flatbuffer union), and the size of the message body:</p>
 of encapsulated messages, each of which follows the format above. The schema
 comes first in the stream, and it is the same for all of the record batches
 that follow. If any fields in the schema are dictionary-encoded, one or more
-<code class="highlighter-rouge">DictionaryBatch</code> messages will be 
included. <code class="highlighter-rouge">DictionaryBatch</code> and
-<code class="highlighter-rouge">RecordBatch</code> messages may be 
interleaved, but before any dictionary key is used
-in a <code class="highlighter-rouge">RecordBatch</code> it should be defined 
in a <code class="highlighter-rouge">DictionaryBatch</code>.</p>
+<code class="highlighter-rouge">DictionaryBatch</code> messages will follow 
the schema.</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>&lt;SCHEMA&gt;
 &lt;DICTIONARY 0&gt;
@@ -200,10 +198,6 @@ in a <code class="highlighter-rouge">RecordBatch</code> it 
should be defined in
 &lt;DICTIONARY k - 1&gt;
 &lt;RECORD BATCH 0&gt;
 ...
-&lt;DICTIONARY x DELTA&gt;
-...
-&lt;DICTIONARY y DELTA&gt;
-...
 &lt;RECORD BATCH n - 1&gt;
 &lt;EOS [optional]: int32&gt;
 </code></pre>
@@ -238,10 +232,6 @@ footer.</p>
 </code></pre>
 </div>
 
-<p>In the file format, there is no requirement that dictionary keys should be
-defined in a <code class="highlighter-rouge">DictionaryBatch</code> before 
they are used in a <code class="highlighter-rouge">RecordBatch</code>, as long
-as the keys are defined somewhere in the file.</p>
-
 <h3 id="recordbatch-body-structure">RecordBatch body structure</h3>
 
 <p>The <code class="highlighter-rouge">RecordBatch</code> metadata contains a 
depth-first (pre-order) flattened set of
@@ -315,7 +305,6 @@ the dictionaries can be properly interpreted.</p>
 <div class="highlighter-rouge"><pre class="highlight"><code>table 
DictionaryBatch {
   id: long;
   data: RecordBatch;
-  isDelta: boolean = false;
 }
 </code></pre>
 </div>
@@ -325,38 +314,6 @@ in the schema, so that dictionaries can even be used for 
multiple fields. See
 the <a 
href="https://github.com/apache/arrow/blob/master/format/Layout.md";>Physical 
Layout</a> document for more about the semantics of
 dictionary-encoded data.</p>
 
-<p>The dictionary <code class="highlighter-rouge">isDelta</code> flag allows 
dictionary batches to be modified
-mid-stream.  A dictionary batch with <code 
class="highlighter-rouge">isDelta</code> set indicates that its vector
-should be concatenated with those of any previous batches with the same <code 
class="highlighter-rouge">id</code>. A
-stream which encodes one column, the list of strings
-<code class="highlighter-rouge">["A", "B", "C", "B", "D", "C", "E", 
"A"]</code>, with a delta dictionary batch could
-take the form:</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code>&lt;SCHEMA&gt;
-&lt;DICTIONARY 0&gt;
-(0) "A"
-(1) "B"
-(2) "C"
-
-&lt;RECORD BATCH 0&gt;
-0
-1
-2
-1
-
-&lt;DICTIONARY 0 DELTA&gt;
-(3) "D"
-(4) "E"
-
-&lt;RECORD BATCH 1&gt;
-3
-2
-4
-0
-EOS
-</code></pre>
-</div>
-
 <h3 id="tensor-multi-dimensional-array-message-format">Tensor 
(Multi-dimensional Array) Message Format</h3>
 
 <p>The <code class="highlighter-rouge">Tensor</code> message types provides a 
way to write a multidimensional array of

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/0a7dc418/docs/memory_layout.html
----------------------------------------------------------------------
diff --git a/docs/memory_layout.html b/docs/memory_layout.html
index 0eb8d03..16a43ea 100644
--- a/docs/memory_layout.html
+++ b/docs/memory_layout.html
@@ -161,8 +161,9 @@ from <code class="highlighter-rouge">List&lt;V&gt;</code> 
iff U and V are differ
 or a fully-specified nested type. When we say slot we mean a relative type
 value, not necessarily any physical storage region.</li>
   <li>Logical type: A data type that is implemented using some relative 
(physical)
-type. For example, Decimal values are stored as 16 bytes in a fixed byte
-size array. Similarly, strings can be stored as <code 
class="highlighter-rouge">List&lt;1-byte&gt;</code>.</li>
+type. For example, a Decimal value stored in 16 bytes could be stored in a
+primitive array with slot size 16 bytes. Similarly, strings can be stored as
+<code class="highlighter-rouge">List&lt;1-byte&gt;</code>.</li>
   <li>Parent and child arrays: names to express relationships between physical
 value arrays in a nested type structure. For example, a <code 
class="highlighter-rouge">List&lt;T&gt;</code>-type parent
 array has a T-type array as its child (see more on lists below).</li>
@@ -751,9 +752,9 @@ the the types array indicates that a slot contains a 
different type at the index
 <h2 id="dictionary-encoding">Dictionary encoding</h2>
 
 <p>When a field is dictionary encoded, the values are represented by an array 
of Int32 representing the index of the value in the dictionary.
-The Dictionary is received as one or more DictionaryBatches with the id 
referenced by a dictionary attribute defined in the metadata (<a 
href="https://github.com/apache/arrow/blob/master/format/Message.fbs";>Message.fbs</a>)
 in the Field table.
-The dictionary has the same layout as the type of the field would dictate. 
Each entry in the dictionary can be accessed by its index in the 
DictionaryBatches.
-When a Schema references a Dictionary id, it must send at least one 
DictionaryBatch for this id.</p>
+The Dictionary is received as a DictionaryBatch whose id is referenced by a 
dictionary attribute defined in the metadata (<a 
href="https://github.com/apache/arrow/blob/master/format/Message.fbs";>Message.fbs</a>)
 in the Field table.
+The dictionary has the same layout as the type of the field would dictate. 
Each entry in the dictionary can be accessed by its index in the 
DictionaryBatch.
+When a Schema references a Dictionary id, it must send a DictionaryBatch for 
this id before any RecordBatch.</p>
 
 <p>As an example, you could have the following data:</p>
 <div class="highlighter-rouge"><pre class="highlight"><code>type: 
List&lt;String&gt;

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/0a7dc418/docs/metadata.html
----------------------------------------------------------------------
diff --git a/docs/metadata.html b/docs/metadata.html
index 9b12883..9e25689 100644
--- a/docs/metadata.html
+++ b/docs/metadata.html
@@ -530,8 +530,7 @@ logical type, which have no children) and 3 buffers:</p>
 
 <h3 id="decimal">Decimal</h3>
 
-<p>Decimals are represented as a 2âs complement 128-bit (16 byte) signed 
integer
-in little-endian byte order.</p>
+<p>TBD</p>
 
 <h3 id="timestamp">Timestamp</h3>
 

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/0a7dc418/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index 27aeb5d..ea204d8 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,238 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2017-12-18T19:07:25-08:00</updated><id>/</id><entry><title 
type="html">Fast Python Serialization with Ray and Apache Arrow</title><link 
href="/blog/2017/10/15/fast-python-serialization-with-ray-and-arrow/" 
rel="alternate" type="text/html" title="Fast Python Serialization with Ray and 
Apache Arrow" 
/><published>2017-10-15T07:00:00-07:00</published><updated>2017-10-15T07:00:00-07:00</updated><id>/blog/2017/10/15/fast-python-serialization-with-ray-and-arrow</id><content
 type="html" 
xml:base="/blog/2017/10/15/fast-python-serialization-with-ray-and-arrow/">&lt;!--
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2017-12-19T10:30:45-05:00</updated><id>/</id><entry><title 
type="html">Apache Arrow 0.8.0 Release</title><link 
href="/blog/2017/12/18/0.8.0-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.8.0 Release" 
/><published>2017-12-18T23:01:00-05:00</published><updated>2017-12-18T23:01:00-05:00</updated><id>/blog/2017/12/18/0.8.0-release</id><content
 type="html" xml:base="/blog/2017/12/18/0.8.0-release/">&lt;!--
+
+--&gt;
+
+&lt;p&gt;The Apache Arrow team is pleased to announce the 0.8.0 release. It is 
the
+product of 10 weeks of development andincludes &lt;a 
href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.8.0&quot;&gt;&lt;strong&gt;286
 resolved JIRAs&lt;/strong&gt;&lt;/a&gt; with
+many new features and bug fixes to the various language implementations. This
+is the largest release since 0.3.0 earlier this year.&lt;/p&gt;
+
+&lt;p&gt;As part of work towards a stabilizing the Arrow format and making a 
1.0.0
+release sometime in 2018, we made a series of backwards-incompatible changes to
+the serialized Arrow metadata that requires Arrow readers and writers (0.7.1
+and earlier) to upgrade in order to be compatible with 0.8.0 and higher. We
+expect future backwards-incompatible changes to be rare going 
forward.&lt;/p&gt;
+
+&lt;p&gt;See the &lt;a 
href=&quot;https://arrow.apache.org/install&quot;&gt;Install Page&lt;/a&gt; to 
learn how to get the libraries for your
+platform. The &lt;a href=&quot;https://github.com/kou&quot;&gt;complete 
changelog&lt;/a&gt; is also available.&lt;/p&gt;
+
+&lt;p&gt;We discuss some highlights from the release and other project news in 
this
+post.&lt;/p&gt;
+
+&lt;h2 id=&quot;projects-powered-by-apache-arrow&quot;&gt;Projects âPowered 
Byâ Apache Arrow&lt;/h2&gt;
+
+&lt;p&gt;A growing ecosystem of projects are using Arrow to solve in-memory 
analytics
+and data interchange problems. We have added a new &lt;a 
href=&quot;http://arrow.apache.org/powered_by/&quot;&gt;Powered By&lt;/a&gt; 
page to the
+Arrow website where we can acknowledge open source projects and companies which
+are using Arrow. If you would like to add your project to the list as an Arrow
+user, please let us know.&lt;/p&gt;
+
+&lt;h2 id=&quot;new-arrow-committers&quot;&gt;New Arrow committers&lt;/h2&gt;
+
+&lt;p&gt;Since the last release, we have added 5 new Apache 
committers:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/cpcloud&quot;&gt;Phillip 
Cloud&lt;/a&gt;, who has mainly contributed to C++ and Python&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/BryanCutler&quot;&gt;Bryan 
Cutler&lt;/a&gt;, who has mainly contributed to Java and Spark 
integration&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/icexelloss&quot;&gt;Li 
Jin&lt;/a&gt;, who has mainly contributed to Java and Spark 
integration&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/trxcllnt&quot;&gt;Paul 
Taylor&lt;/a&gt;, who has mainly contributed to JavaScript&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/siddharthteotia&quot;&gt;Siddharth 
Teotia&lt;/a&gt;, who has mainly contributed to Java&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Welcome to the Arrow team, and thank you for your 
contributions!&lt;/p&gt;
+
+&lt;h2 
id=&quot;improved-java-vector-api-performance-improvements&quot;&gt;Improved 
Java vector API, performance improvements&lt;/h2&gt;
+
+&lt;p&gt;Siddharth Teotia led efforts to revamp the Java vector API to make 
things
+simpler and faster. As part of this, we removed the dichotomy between nullable
+and non-nullable vectors.&lt;/p&gt;
+
+&lt;p&gt;See &lt;a 
href=&quot;https://arrow.apache.org/blog/2017/12/19/java-vector-improvements/&quot;&gt;Siddâs
 blog post&lt;/a&gt; for more about these changes.&lt;/p&gt;
+
+&lt;h2 
id=&quot;decimal-support-in-c-python-consistency-with-java&quot;&gt;Decimal 
support in C++, Python, consistency with Java&lt;/h2&gt;
+
+&lt;p&gt;&lt;a href=&quot;https://github.com/cpcloud&quot;&gt;Phillip 
Cloud&lt;/a&gt; led efforts this release to harden details about exact
+decimal values in the Arrow specification and ensure a consistent
+implementation across Java, C++, and Python.&lt;/p&gt;
+
+&lt;p&gt;Arrow now supports decimals represented internally as a 128-bit 
little-endian
+integer, with a set precision and scale (as defined in many SQL-based
+systems). As part of this work, we needed to change Javaâs internal
+representation from big- to little-endian.&lt;/p&gt;
+
+&lt;p&gt;We are now integration testing decimals between Java, C++, and 
Python, which
+will facilitate Arrow adoption in Apache Spark and other systems that use both
+Java and Python.&lt;/p&gt;
+
+&lt;p&gt;Decimal data can now be read and written by the &lt;a 
href=&quot;https://github.com/apache/parquet-cpp&quot;&gt;Apache Parquet C++
+library&lt;/a&gt;, including via pyarrow.&lt;/p&gt;
+
+&lt;p&gt;In the future, we may implement support for smaller-precision decimals
+represented by 32- or 64-bit integers.&lt;/p&gt;
+
+&lt;h2 id=&quot;c-improvements-expanded-kernels-library-and-more&quot;&gt;C++ 
improvements: expanded kernels library and more&lt;/h2&gt;
+
+&lt;p&gt;In C++, we have continued developing the new &lt;code 
class=&quot;highlighter-rouge&quot;&gt;arrow::compute&lt;/code&gt; submodule
+consisting of native computation fuctions for Arrow data. New contributor
+&lt;a href=&quot;https://github.com/licht-t&quot;&gt;Licht Takeuchi&lt;/a&gt; 
helped expand the supported types for type casting in
+&lt;code class=&quot;highlighter-rouge&quot;&gt;compute::Cast&lt;/code&gt;. We 
have also implemented new kernels &lt;code 
class=&quot;highlighter-rouge&quot;&gt;Unique&lt;/code&gt; and
+&lt;code class=&quot;highlighter-rouge&quot;&gt;DictionaryEncode&lt;/code&gt; 
for computing the distinct elements of an array and
+dictionary encoding (conversion to categorical), respectively.&lt;/p&gt;
+
+&lt;p&gt;We expect the C++ computation âkernelâ library to be a major 
expansion area for
+the project over the next year and beyond. Here, we can also implement SIMD-
+and GPU-accelerated versions of basic in-memory analytics 
functionality.&lt;/p&gt;
+
+&lt;p&gt;As minor breaking API change in C++, we have made the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;RecordBatch&lt;/code&gt; and &lt;code 
class=&quot;highlighter-rouge&quot;&gt;Table&lt;/code&gt;
+APIs âvirtualâ or abstract interfaces, to enable different implementations 
of a
+record batch or table which conform to the standard interface. This will help
+enable features like lazy IO or column loading.&lt;/p&gt;
+
+&lt;p&gt;There was significant work improving the C++ library generally and 
supporting
+work happening in Python and C. See the change log for full details.&lt;/p&gt;
+
+&lt;h2 id=&quot;glib-c-improvements-meson-build-gpu-support&quot;&gt;GLib C 
improvements: Meson build, GPU support&lt;/h2&gt;
+
+&lt;p&gt;Developing of the GLib-based C bindings has generally tracked work 
happening in
+the C++ library. These bindings are being used to develop &lt;a 
href=&quot;https://github.com/red-data-tools&quot;&gt;data science tools
+for Ruby users&lt;/a&gt; and elsewhere.&lt;/p&gt;
+
+&lt;p&gt;The C bindings now support the &lt;a 
href=&quot;https://mesonbuild.com&quot;&gt;Meson build system&lt;/a&gt; in 
addition to
+autotools, which enables them to be built on Windows.&lt;/p&gt;
+
+&lt;p&gt;The Arrow GPU extension library is now also supported in the C 
bindings.&lt;/p&gt;
+
+&lt;h2 
id=&quot;javascript-first-independent-release-on-npm&quot;&gt;JavaScript: first 
independent release on NPM&lt;/h2&gt;
+
+&lt;p&gt;&lt;a href=&quot;https://github.com/TheNeuralBit&quot;&gt;Brian 
Hulette&lt;/a&gt; and &lt;a 
href=&quot;https://github.com/trxcllnt&quot;&gt;Paul Taylor&lt;/a&gt; have been 
continuing to drive efforts
+on the TypeScript-based JavaScript implementation.&lt;/p&gt;
+
+&lt;p&gt;Since the last release, we made a first JavaScript-only Apache 
release, version
+0.2.0, which is &lt;a 
href=&quot;http://npmjs.org/package/apache-arrow&quot;&gt;now available on 
NPM&lt;/a&gt;. We decided to make separate
+JavaScript releases to enable the JS library to release more frequently than
+the rest of the project.&lt;/p&gt;
+
+&lt;h2 id=&quot;python-improvements&quot;&gt;Python improvements&lt;/h2&gt;
+
+&lt;p&gt;In addition to some of the new features mentioned above, we have made 
a variety
+of usability and performance improvements for integrations with pandas, NumPy,
+Dask, and other Python projects which may make use of pyarrow, the Arrow Python
+library.&lt;/p&gt;
+
+&lt;p&gt;Some of these improvements include:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;a 
href=&quot;http://arrow.apache.org/docs/python/ipc.html&quot;&gt;Component-based
 serialization&lt;/a&gt; for more flexible and memory-efficient
+transport of large or complex Python objects&lt;/li&gt;
+  &lt;li&gt;Substantially improved serialization performance for pandas 
objects when
+using &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.serialize&lt;/code&gt; and 
&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.deserialize&lt;/code&gt;. This 
includes a special
+&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.pandas_serialization_context&lt;/code&gt;
 which further accelerates certain
+internal details of pandas serialization * Support zero-copy reads 
for&lt;/li&gt;
+  &lt;li&gt;&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pandas.DataFrame&lt;/code&gt; using 
&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.deserialize&lt;/code&gt; for 
objects without Python
+objects&lt;/li&gt;
+  &lt;li&gt;Multithreaded conversions from &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pandas.DataFrame&lt;/code&gt; to 
&lt;code class=&quot;highlighter-rouge&quot;&gt;pyarrow.Table&lt;/code&gt; (we
+already supported multithreaded conversions from Arrow back to 
pandas)&lt;/li&gt;
+  &lt;li&gt;More efficient conversion from 1-dimensional NumPy arrays to Arrow 
format&lt;/li&gt;
+  &lt;li&gt;New generic buffer compression and decompression APIs &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.compress&lt;/code&gt; and
+&lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow.decompress&lt;/code&gt;&lt;/li&gt;
+  &lt;li&gt;Enhanced Parquet cross-compatibility with &lt;a 
href=&quot;https://github.com/dask/fastparquet&quot;&gt;fastparquet&lt;/a&gt; 
and improved Dask
+support&lt;/li&gt;
+  &lt;li&gt;Python support for accessing Parquet row group column 
statistics&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;upcoming-roadmap&quot;&gt;Upcoming Roadmap&lt;/h2&gt;
+
+&lt;p&gt;The 0.8.0 release includes some API and format changes, but upcoming 
releases
+will focus on ompleting and stabilizing critical functionality to move the
+project closer to a 1.0.0 release.&lt;/p&gt;
+
+&lt;p&gt;With the ecosystem of projects using Arrow expanding rapidly, we will 
be
+working to improve and expand the libraries in support of downstream use 
cases.&lt;/p&gt;
+
+&lt;p&gt;We continue to look for more JavaScript, Julia, R, Rust, and other 
programming
+language developers to join the project and expand the available
+implementations and bindings to more 
languages.&lt;/p&gt;</content><author><name>wesm</name></author></entry><entry><title
 type="html">Improvements to Java Vector API in Apache Arrow 0.8.0</title><link 
href="/blog/2017/12/19/java-vector-improvements/" rel="alternate" 
type="text/html" title="Improvements to Java Vector API in Apache Arrow 0.8.0" 
/><published>2017-12-18T19:00:00-05:00</published><updated>2017-12-18T19:00:00-05:00</updated><id>/blog/2017/12/19/java-vector-improvements</id><content
 type="html" xml:base="/blog/2017/12/19/java-vector-improvements/">&lt;!--
+
+--&gt;
+
+&lt;p&gt;This post gives insight into the major improvements in the Java 
implementation
+of vectors. We undertook this work over the last 10 weeks since the last Arrow
+release.&lt;/p&gt;
+
+&lt;h2 id=&quot;design-goals&quot;&gt;Design Goals&lt;/h2&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;Improved maintainability and extensibility&lt;/li&gt;
+  &lt;li&gt;Improved heap memory usage&lt;/li&gt;
+  &lt;li&gt;No performance overhead on hot code paths&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
+
+&lt;h3 id=&quot;improved-maintainability-and-extensibility&quot;&gt;Improved 
maintainability and extensibility&lt;/h3&gt;
+
+&lt;p&gt;We use templates in several places for compile time Java code 
generation for
+different vector classes, readers, writers etc. Templates are helpful as the
+developers donât have to write a lot of duplicate code.&lt;/p&gt;
+
+&lt;p&gt;However, we realized that over a period of time some specific Java
+templates became extremely complex with giant if-else blocks, poor code 
indentation
+and documentation. All this impacted the ability to easily extend these 
templates
+for adding new functionality or improving the existing 
infrastructure.&lt;/p&gt;
+
+&lt;p&gt;So we evaluated the usage of templates for compile time code 
generation and
+decided not to use complex templates in some places by writing small amount of
+duplicate code which is elegant, well documented and extensible.&lt;/p&gt;
+
+&lt;h3 id=&quot;improved-heap-usage&quot;&gt;Improved heap usage&lt;/h3&gt;
+
+&lt;p&gt;We did extensive memory analysis downstream in &lt;a 
href=&quot;https://www.dremio.com/&quot;&gt;Dremio&lt;/a&gt; where Arrow is used
+heavily for in-memory query execution on columnar data. The general conclusion
+was that Arrowâs Java vector classes have non-negligible heap overhead and
+volume of objects was too high. There were places in code where we were
+creating objects unnecessarily and using structures that could be substituted
+with better alternatives.&lt;/p&gt;
+
+&lt;h3 id=&quot;no-performance-overhead-on-hot-code-paths&quot;&gt;No 
performance overhead on hot code paths&lt;/h3&gt;
+
+&lt;p&gt;Java vectors used delegation and abstraction heavily throughout the 
object
+hierarchy. The performance critical get/set methods of vectors went through a
+chain of function calls back and forth between different objects before doing
+meaningful work. We also evaluated the usage of branches in vector APIs and
+reimplemented some of them by avoiding branches completely.&lt;/p&gt;
+
+&lt;p&gt;We took inspiration from how the Java memory code in &lt;code 
class=&quot;highlighter-rouge&quot;&gt;ArrowBuf&lt;/code&gt; works. For all
+the performance critical methods, &lt;code 
class=&quot;highlighter-rouge&quot;&gt;ArrowBuf&lt;/code&gt; bypasses all the 
netty object
+hierarchy, grabs the target virtual address and directly interacts with the
+memory.&lt;/p&gt;
+
+&lt;p&gt;There were cases where branches could be avoided all 
together.&lt;/p&gt;
+
+&lt;p&gt;In case of nullable vectors, we were doing multiple checks to confirm 
if
+the value at a given position in the vector is null or not.&lt;/p&gt;
+
+&lt;h2 id=&quot;our-implementation-approach&quot;&gt;Our implementation 
approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;For scalars, the inheritance tree was simplified by writing 
different
+abstract base classes for fixed and variable width scalars.&lt;/li&gt;
+  &lt;li&gt;The base classes contained all the common functionality across 
different
+types.&lt;/li&gt;
+  &lt;li&gt;The individual subclasses implemented type specific APIs for fixed 
and
+variable width scalar vectors.&lt;/li&gt;
+  &lt;li&gt;For the performance critical methods, all the work is done either 
in
+the vector class or corresponding ArrowBuf. There is no delegation to any
+internal object.&lt;/li&gt;
+  &lt;li&gt;The mutator and accessor based access to vector APIs is removed. 
These
+objects led to unnecessary heap overhead and complicated the use of 
APIs.&lt;/li&gt;
+  &lt;li&gt;Both scalar and complex vectors directly interact with underlying 
buffers
+that manage the offsets, data and validity. Earlier we were creating different
+inner vectors for each vector and delegating all the functionality to inner
+vectors. This introduced a lot of bugs in memory management, excessive heap
+overhead and performance penalty due to chain of delegations.&lt;/li&gt;
+  &lt;li&gt;We reduced the number of vector classes by removing non-nullable 
vectors.
+In the new implementation, all vectors in Java are nullable in 
nature.&lt;/li&gt;
+&lt;/ul&gt;</content><author><name>Siddharth Teotia</name></author><summary 
type="html">This post describes the recent improvements in Java Vector 
code</summary></entry><entry><title type="html">Fast Python Serialization with 
Ray and Apache Arrow</title><link 
href="/blog/2017/10/15/fast-python-serialization-with-ray-and-arrow/" 
rel="alternate" type="text/html" title="Fast Python Serialization with Ray and 
Apache Arrow" 
/><published>2017-10-15T10:00:00-04:00</published><updated>2017-10-15T10:00:00-04:00</updated><id>/blog/2017/10/15/fast-python-serialization-with-ray-and-arrow</id><content
 type="html" 
xml:base="/blog/2017/10/15/fast-python-serialization-with-ray-and-arrow/">&lt;!--
 
 --&gt;
 
@@ -275,7 +509,7 @@ Benchmarking &lt;code 
class=&quot;highlighter-rouge&quot;&gt;ray.put&lt;/code&gt
 &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span 
class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span 
class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;test_objects&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)):&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;plot&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;benchmark_object&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;test_objects&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]),&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;titles&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;</content><author><name>Philipp Moritz, Robert 
Nishihara</name></author><summary type="html">This post describes how 
serialization works in Ray.</summary></entry><entry><title type="html">Apache 
Arrow 0.7.0 Release</title><link href="/blog/2017/09/18/0.7.0-release/" 
rel="alternate" type="text/html" title="Apache Arrow 0.7.0 Release" 
/><published>2017-09-18T21:00:00-07:00</published><updated>2017-09-18T21:00:00-07:00</updated><id>/blog/2017/09/18/0.7.0-release</id><content
 type="html" xml:base="/blog/2017/09/18/0.7.0-release/">&lt;!--
+&lt;/div&gt;</content><author><name>Philipp Moritz, Robert 
Nishihara</name></author><summary type="html">This post describes how 
serialization works in Ray.</summary></entry><entry><title type="html">Apache 
Arrow 0.7.0 Release</title><link href="/blog/2017/09/19/0.7.0-release/" 
rel="alternate" type="text/html" title="Apache Arrow 0.7.0 Release" 
/><published>2017-09-19T00:00:00-04:00</published><updated>2017-09-19T00:00:00-04:00</updated><id>/blog/2017/09/19/0.7.0-release</id><content
 type="html" xml:base="/blog/2017/09/19/0.7.0-release/">&lt;!--
 
 --&gt;
 
@@ -434,7 +668,7 @@ analytics libraries.&lt;/p&gt;
 
 &lt;p&gt;We are looking for more JavaScript, R, and other programming language
 developers to join the project and expand the available implementations and
-bindings to more 
languages.&lt;/p&gt;</content><author><name>wesm</name></author></entry><entry><title
 type="html">Apache Arrow 0.6.0 Release</title><link 
href="/blog/2017/08/15/0.6.0-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.6.0 Release" 
/><published>2017-08-15T21:00:00-07:00</published><updated>2017-08-15T21:00:00-07:00</updated><id>/blog/2017/08/15/0.6.0-release</id><content
 type="html" xml:base="/blog/2017/08/15/0.6.0-release/">&lt;!--
+bindings to more 
languages.&lt;/p&gt;</content><author><name>wesm</name></author></entry><entry><title
 type="html">Apache Arrow 0.6.0 Release</title><link 
href="/blog/2017/08/16/0.6.0-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.6.0 Release" 
/><published>2017-08-16T00:00:00-04:00</published><updated>2017-08-16T00:00:00-04:00</updated><id>/blog/2017/08/16/0.6.0-release</id><content
 type="html" xml:base="/blog/2017/08/16/0.6.0-release/">&lt;!--
 
 --&gt;
 
@@ -516,7 +750,7 @@ milliseconds, or &lt;code 
class=&quot;highlighter-rouge&quot;&gt;'us'&lt;/code&g
 &lt;p&gt;We are still discussing the roadmap to 1.0.0 release on the &lt;a 
href=&quot;http://mail-archives.apache.org/mod_mbox/arrow-dev/&quot;&gt;developer
 mailing
 list&lt;/a&gt;. The focus of the 1.0.0 release will likely be memory format 
stability
 and hardening integration tests across the remaining data types implemented in
-Java and C++. Please join the discussion 
there.&lt;/p&gt;</content><author><name>wesm</name></author></entry><entry><title
 type="html">Plasma In-Memory Object Store</title><link 
href="/blog/2017/08/07/plasma-in-memory-object-store/" rel="alternate" 
type="text/html" title="Plasma In-Memory Object Store" 
/><published>2017-08-07T21:00:00-07:00</published><updated>2017-08-07T21:00:00-07:00</updated><id>/blog/2017/08/07/plasma-in-memory-object-store</id><content
 type="html" xml:base="/blog/2017/08/07/plasma-in-memory-object-store/">&lt;!--
+Java and C++. Please join the discussion 
there.&lt;/p&gt;</content><author><name>wesm</name></author></entry><entry><title
 type="html">Plasma In-Memory Object Store</title><link 
href="/blog/2017/08/08/plasma-in-memory-object-store/" rel="alternate" 
type="text/html" title="Plasma In-Memory Object Store" 
/><published>2017-08-08T00:00:00-04:00</published><updated>2017-08-08T00:00:00-04:00</updated><id>/blog/2017/08/08/plasma-in-memory-object-store</id><content
 type="html" xml:base="/blog/2017/08/08/plasma-in-memory-object-store/">&lt;!--
 
 --&gt;
 
@@ -637,7 +871,7 @@ primarily used in &lt;a 
href=&quot;https://github.com/ray-project/ray&quot;&gt;R
 We are looking for a broader set of use cases to help refine Plasmaâs API. In
 addition, we are looking for contributions in a variety of areas including
 improving performance and building other language bindings. Please let us know
-if you are interested in getting involved with the 
project.&lt;/p&gt;</content><author><name>Philipp Moritz and Robert 
Nishihara</name></author></entry><entry><title type="html">Speeding up PySpark 
with Apache Arrow</title><link href="/blog/2017/07/26/spark-arrow/" 
rel="alternate" type="text/html" title="Speeding up PySpark with Apache Arrow" 
/><published>2017-07-26T09:00:00-07:00</published><updated>2017-07-26T09:00:00-07:00</updated><id>/blog/2017/07/26/spark-arrow</id><content
 type="html" xml:base="/blog/2017/07/26/spark-arrow/">&lt;!--
+if you are interested in getting involved with the 
project.&lt;/p&gt;</content><author><name>Philipp Moritz and Robert 
Nishihara</name></author></entry><entry><title type="html">Speeding up PySpark 
with Apache Arrow</title><link href="/blog/2017/07/26/spark-arrow/" 
rel="alternate" type="text/html" title="Speeding up PySpark with Apache Arrow" 
/><published>2017-07-26T12:00:00-04:00</published><updated>2017-07-26T12:00:00-04:00</updated><id>/blog/2017/07/26/spark-arrow</id><content
 type="html" xml:base="/blog/2017/07/26/spark-arrow/">&lt;!--
 
 --&gt;
 
@@ -756,7 +990,7 @@ DataFrame (&lt;a 
href=&quot;https://issues.apache.org/jira/browse/SPARK-20791&qu
 &lt;p&gt;Reaching this first milestone was a group effort from both the Apache 
Arrow and
 Spark communities. Thanks to the hard work of &lt;a 
href=&quot;https://github.com/wesm&quot;&gt;Wes McKinney&lt;/a&gt;, &lt;a 
href=&quot;https://github.com/icexelloss&quot;&gt;Li Jin&lt;/a&gt;,
 &lt;a href=&quot;https://github.com/holdenk&quot;&gt;Holden Karau&lt;/a&gt;, 
Reynold Xin, Wenchen Fan, Shane Knapp and many others that
-helped push this effort 
forwards.&lt;/p&gt;</content><author><name>BryanCutler</name></author></entry><entry><title
 type="html">Apache Arrow 0.5.0 Release</title><link 
href="/blog/2017/07/24/0.5.0-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.5.0 Release" 
/><published>2017-07-24T21:00:00-07:00</published><updated>2017-07-24T21:00:00-07:00</updated><id>/blog/2017/07/24/0.5.0-release</id><content
 type="html" xml:base="/blog/2017/07/24/0.5.0-release/">&lt;!--
+helped push this effort 
forwards.&lt;/p&gt;</content><author><name>BryanCutler</name></author></entry><entry><title
 type="html">Apache Arrow 0.5.0 Release</title><link 
href="/blog/2017/07/25/0.5.0-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.5.0 Release" 
/><published>2017-07-25T00:00:00-04:00</published><updated>2017-07-25T00:00:00-04:00</updated><id>/blog/2017/07/25/0.5.0-release</id><content
 type="html" xml:base="/blog/2017/07/25/0.5.0-release/">&lt;!--
 
 --&gt;
 
@@ -839,7 +1073,7 @@ systems to improve their processing performance and 
interoperability with other
 systems.&lt;/p&gt;
 
 &lt;p&gt;We are discussing the roadmap to a future 1.0.0 release on the &lt;a 
href=&quot;http://mail-archives.apache.org/mod_mbox/arrow-dev/&quot;&gt;developer
-mailing list&lt;/a&gt;. Please join the discussion 
there.&lt;/p&gt;</content><author><name>wesm</name></author></entry><entry><title
 type="html">Connecting Relational Databases to the Apache Arrow World with 
turbodbc</title><link href="/blog/2017/06/16/turbodbc-arrow/" rel="alternate" 
type="text/html" title="Connecting Relational Databases to the Apache Arrow 
World with turbodbc" 
/><published>2017-06-16T01:00:00-07:00</published><updated>2017-06-16T01:00:00-07:00</updated><id>/blog/2017/06/16/turbodbc-arrow</id><content
 type="html" xml:base="/blog/2017/06/16/turbodbc-arrow/">&lt;!--
+mailing list&lt;/a&gt;. Please join the discussion 
there.&lt;/p&gt;</content><author><name>wesm</name></author></entry><entry><title
 type="html">Connecting Relational Databases to the Apache Arrow World with 
turbodbc</title><link href="/blog/2017/06/16/turbodbc-arrow/" rel="alternate" 
type="text/html" title="Connecting Relational Databases to the Apache Arrow 
World with turbodbc" 
/><published>2017-06-16T04:00:00-04:00</published><updated>2017-06-16T04:00:00-04:00</updated><id>/blog/2017/06/16/turbodbc-arrow</id><content
 type="html" xml:base="/blog/2017/06/16/turbodbc-arrow/">&lt;!--
 
 --&gt;
 
@@ -918,7 +1152,7 @@ databases.&lt;/p&gt;
 &lt;p&gt;If you would like to learn more about turbodbc, check out the &lt;a 
href=&quot;https://github.com/blue-yonder/turbodbc&quot;&gt;GitHub 
project&lt;/a&gt; and the
 &lt;a href=&quot;http://turbodbc.readthedocs.io/&quot;&gt;project 
documentation&lt;/a&gt;. If you want to learn more about how turbodbc 
implements the
 nitty-gritty details, check out parts &lt;a 
href=&quot;https://tech.blue-yonder.com/making-of-turbodbc-part-1-wrestling-with-the-side-effects-of-a-c-api/&quot;&gt;one&lt;/a&gt;
 and &lt;a 
href=&quot;https://tech.blue-yonder.com/making-of-turbodbc-part-2-c-to-python/&quot;&gt;two&lt;/a&gt;
 of the
-&lt;a 
href=&quot;https://tech.blue-yonder.com/making-of-turbodbc-part-1-wrestling-with-the-side-effects-of-a-c-api/&quot;&gt;âMaking
 of turbodbcâ&lt;/a&gt; series at &lt;a 
href=&quot;https://tech.blue-yonder.com/&quot;&gt;Blue Yonderâs technology 
blog&lt;/a&gt;.&lt;/p&gt;</content><author><name>MathMagique</name></author></entry><entry><title
 type="html">Apache Arrow 0.4.1 Release</title><link 
href="/blog/2017/06/14/0.4.1-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.4.1 Release" 
/><published>2017-06-14T07:00:00-07:00</published><updated>2017-06-14T07:00:00-07:00</updated><id>/blog/2017/06/14/0.4.1-release</id><content
 type="html" xml:base="/blog/2017/06/14/0.4.1-release/">&lt;!--
+&lt;a 
href=&quot;https://tech.blue-yonder.com/making-of-turbodbc-part-1-wrestling-with-the-side-effects-of-a-c-api/&quot;&gt;âMaking
 of turbodbcâ&lt;/a&gt; series at &lt;a 
href=&quot;https://tech.blue-yonder.com/&quot;&gt;Blue Yonderâs technology 
blog&lt;/a&gt;.&lt;/p&gt;</content><author><name>MathMagique</name></author></entry><entry><title
 type="html">Apache Arrow 0.4.1 Release</title><link 
href="/blog/2017/06/14/0.4.1-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.4.1 Release" 
/><published>2017-06-14T10:00:00-04:00</published><updated>2017-06-14T10:00:00-04:00</updated><id>/blog/2017/06/14/0.4.1-release</id><content
 type="html" xml:base="/blog/2017/06/14/0.4.1-release/">&lt;!--
 
 --&gt;
 
@@ -953,289 +1187,4 @@ team used the PyArrow C++ API introduced in version 
0.4.0 to construct
 &lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;pip install turbodbc
 conda install turbodbc -c conda-forge
 &lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;</content><author><name>wesm</name></author></entry><entry><title 
type="html">Apache Arrow 0.4.0 Release</title><link 
href="/blog/2017/05/22/0.4.0-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.4.0 Release" 
/><published>2017-05-22T21:00:00-07:00</published><updated>2017-05-22T21:00:00-07:00</updated><id>/blog/2017/05/22/0.4.0-release</id><content
 type="html" xml:base="/blog/2017/05/22/0.4.0-release/">&lt;!--
-
---&gt;
-
-&lt;p&gt;The Apache Arrow team is pleased to announce the 0.4.0 release of the
-project. While only 17 days since the release, it includes &lt;a 
href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.4.0&quot;&gt;&lt;strong&gt;77
 resolved
-JIRAs&lt;/strong&gt;&lt;/a&gt; with some important new features and bug 
fixes.&lt;/p&gt;
-
-&lt;p&gt;See the &lt;a 
href=&quot;http://arrow.apache.org/install&quot;&gt;Install Page&lt;/a&gt; to 
learn how to get the libraries for your platform.&lt;/p&gt;
-
-&lt;h3 id=&quot;expanded-javascript-implementation&quot;&gt;Expanded 
JavaScript Implementation&lt;/h3&gt;
-
-&lt;p&gt;The TypeScript Arrow implementation has undergone some work since 
0.3.0 and can
-now read a substantial portion of the Arrow streaming binary format. As this
-implementation develops, we will eventually want to include JS in the
-integration test suite along with Java and C++ to ensure wire
-cross-compatibility.&lt;/p&gt;
-
-&lt;h3 id=&quot;python-support-for-apache-parquet-on-windows&quot;&gt;Python 
Support for Apache Parquet on Windows&lt;/h3&gt;
-
-&lt;p&gt;With the &lt;a 
href=&quot;https://github.com/apache/parquet-cpp/releases/tag/apache-parquet-cpp-1.1.0&quot;&gt;1.1.0
 C++ release&lt;/a&gt; of &lt;a 
href=&quot;http://parquet.apache.org&quot;&gt;Apache Parquet&lt;/a&gt;, we have 
enabled the
-&lt;code class=&quot;highlighter-rouge&quot;&gt;pyarrow.parquet&lt;/code&gt; 
extension on Windows for Python 3.5 and 3.6. This should
-appear in conda-forge packages and PyPI in the near future. Developers can
-follow the &lt;a 
href=&quot;http://arrow.apache.org/docs/python/development.html&quot;&gt;source 
build instructions&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h3 id=&quot;generalizing-arrow-streams&quot;&gt;Generalizing Arrow 
Streams&lt;/h3&gt;
-
-&lt;p&gt;In the 0.2.0 release, we defined the first version of the Arrow 
streaming
-binary format for low-cost messaging with columnar data. These streams presume
-that the message components are written as a continuous byte stream over a
-socket or file.&lt;/p&gt;
-
-&lt;p&gt;We would like to be able to support other other transport protocols, 
like
-&lt;a href=&quot;http://grpc.io/&quot;&gt;gRPC&lt;/a&gt;, for the message 
components of Arrow streams. To that end, in C++ we
-defined an abstract stream reader interface, for which the current contiguous
-streaming format is one implementation:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-cpp&quot; data-lang=&quot;cpp&quot;&gt;&lt;span 
class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span 
class=&quot;nc&quot;&gt;RecordBatchReader&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;{&lt;/span&gt;
- &lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;:&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;virtual&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;shared_ptr&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span 
class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;virtual&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;Status&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;GetNextRecordBatch&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;shared_ptr&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;RecordBatch&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;&amp;gt;*&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;batch&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;;&lt;/span&gt;
-&lt;span 
class=&quot;p&quot;&gt;};&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;It would also be good to define abstract stream reader and writer 
interfaces in
-the Java implementation.&lt;/p&gt;
-
-&lt;p&gt;In an upcoming blog post, we will explain in more depth how Arrow 
streams work,
-but you can learn more about them by reading the &lt;a 
href=&quot;http://arrow.apache.org/docs/ipc.html&quot;&gt;IPC 
specification&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h3 id=&quot;c-and-cython-api-for-python-extensions&quot;&gt;C++ and Cython 
API for Python Extensions&lt;/h3&gt;
-
-&lt;p&gt;As other Python libraries with C or C++ extensions use Apache Arrow, 
they will
-need to be able to return Python objects wrapping the underlying C++
-objects. In this release, we have implemented a prototype C++ API which enables
-Python wrapper objects to be constructed from C++ extension code:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-cpp&quot; data-lang=&quot;cpp&quot;&gt;&lt;span 
class=&quot;cp&quot;&gt;#include &quot;arrow/python/pyarrow.h&quot;
-&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;arrow&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;py&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;import_pyarrow&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;c1&quot;&gt;// Error
-&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;shared_ptr&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;arrow&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;RecordBatch&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;cpp_batch&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;GetData&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(...);&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;PyObject&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;py_batch&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;arrow&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;py&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;wrap_batch&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;cpp_batch&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;This API is intended to be usable from Cython code as well:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-cython&quot; data-lang=&quot;cython&quot;&gt;cimport 
pyarrow
-pyarrow.import_pyarrow()&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;h3 id=&quot;python-wheel-installers-on-macos&quot;&gt;Python Wheel 
Installers on macOS&lt;/h3&gt;
-
-&lt;p&gt;With this release, &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pip install pyarrow&lt;/code&gt; works 
on macOS (OS X) as well as
-Linux. We are working on providing binary wheel installers for Windows as 
well.&lt;/p&gt;</content><author><name>wesm</name></author></entry><entry><title
 type="html">Apache Arrow 0.3.0 Release</title><link 
href="/blog/2017/05/07/0.3-release/" rel="alternate" type="text/html" 
title="Apache Arrow 0.3.0 Release" 
/><published>2017-05-07T21:00:00-07:00</published><updated>2017-05-07T21:00:00-07:00</updated><id>/blog/2017/05/07/0.3-release</id><content
 type="html" xml:base="/blog/2017/05/07/0.3-release/">&lt;!--
-
---&gt;
-
-&lt;p&gt;Translations: &lt;a 
href=&quot;/blog/2017/05/07/0.3-release-japanese/&quot;&gt;æ¥æ¬èª&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;The Apache Arrow team is pleased to announce the 0.3.0 release of the
-project. It is the product of an intense 10 weeks of development since the
-0.2.0 release from this past February. It includes &lt;a 
href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.3.0&quot;&gt;&lt;strong&gt;306
 resolved JIRAs&lt;/strong&gt;&lt;/a&gt;
-from &lt;a 
href=&quot;https://github.com/apache/arrow/graphs/contributors&quot;&gt;&lt;strong&gt;23
 contributors&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;While we have added many new features to the different Arrow 
implementations,
-one of the major development focuses in 2017 has been hardening the in-memory
-format, type metadata, and messaging protocol to provide a 
&lt;strong&gt;stable,
-production-ready foundation&lt;/strong&gt; for big data applications. We are 
excited to be
-collaborating with the &lt;a 
href=&quot;http://spark.apache.org&quot;&gt;Apache Spark&lt;/a&gt; and &lt;a 
href=&quot;http://www.geomesa.org/&quot;&gt;GeoMesa&lt;/a&gt; communities on
-utilizing Arrow for high performance IO and in-memory data 
processing.&lt;/p&gt;
-
-&lt;p&gt;See the &lt;a 
href=&quot;http://arrow.apache.org/install&quot;&gt;Install Page&lt;/a&gt; to 
learn how to get the libraries for your platform.&lt;/p&gt;
-
-&lt;p&gt;We will be publishing more information about the Apache Arrow roadmap 
as we
-forge ahead with using Arrow to accelerate big data systems.&lt;/p&gt;
-
-&lt;p&gt;We are looking for more contributors from within our existing 
communities and
-from other communities (such as Go, R, or Julia) to get involved in Arrow
-development.&lt;/p&gt;
-
-&lt;h3 id=&quot;file-and-streaming-format-hardening&quot;&gt;File and 
Streaming Format Hardening&lt;/h3&gt;
-
-&lt;p&gt;The 0.2.0 release brought with it the first iterations of the 
&lt;strong&gt;random access&lt;/strong&gt;
-and &lt;strong&gt;streaming&lt;/strong&gt; Arrow wire formats. See the &lt;a 
href=&quot;http://arrow.apache.org/docs/ipc.html&quot;&gt;IPC 
specification&lt;/a&gt; for
-implementation details and &lt;a 
href=&quot;http://wesmckinney.com/blog/arrow-streaming-columnar/&quot;&gt;example
 blog post&lt;/a&gt; with some use cases. These
-provide low-overhead, zero-copy access to Arrow record batch 
payloads.&lt;/p&gt;
-
-&lt;p&gt;In 0.3.0 we have solidified a number of small details with the binary 
format
-and improved our integration and unit testing particularly in the Java, C++,
-and Python libraries. Using the &lt;a 
href=&quot;http://github.com/google/flatbuffers&quot;&gt;Google 
Flatbuffers&lt;/a&gt; project has helped with
-adding new features to our metadata without breaking forward 
compatibility.&lt;/p&gt;
-
-&lt;p&gt;We are not yet ready to make a firm commitment to strong forward 
compatibility
-(in case we find something needs to change) in the binary format, but we will
-make efforts between major releases to not make unnecessary
-breakages. Contributions to the website and component user and API
-documentation would also be most welcome.&lt;/p&gt;
-
-&lt;h3 id=&quot;dictionary-encoding-support&quot;&gt;Dictionary Encoding 
Support&lt;/h3&gt;
-
-&lt;p&gt;&lt;a href=&quot;https://github.com/elahrvivaz&quot;&gt;Emilio 
Lahr-Vivaz&lt;/a&gt; from the &lt;a 
href=&quot;http://www.geomesa.org/&quot;&gt;GeoMesa&lt;/a&gt; project 
contributed Java support
-for dictionary-encoded Arrow vectors. We followed up with C++ and Python
-support (and &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pandas.Categorical&lt;/code&gt; 
integration). We have not yet implemented
-full integration tests for dictionaries (for sending this data between C++ and
-Java), but hope to achieve this in the 0.4.0 Arrow release.&lt;/p&gt;
-
-&lt;p&gt;This common data representation technique for categorical data allows 
multiple
-record batches to share a common âdictionaryâ, with the values in the 
batches
-being represented as integers referencing the dictionary. This data is called
-âcategoricalâ or âfactorâ in statistical languages, while in file 
formats like
-Apache Parquet it is strictly used for data compression.&lt;/p&gt;
-
-&lt;h3 id=&quot;expanded-date-time-and-fixed-size-types&quot;&gt;Expanded 
Date, Time, and Fixed Size Types&lt;/h3&gt;
-
-&lt;p&gt;A notable omission from the 0.2.0 release was complete and 
integration-tested
-support for the gamut of date and time types that occur in the wild. These are
-needed for &lt;a href=&quot;http://parquet.apache.org&quot;&gt;Apache 
Parquet&lt;/a&gt; and Apache Spark integration.&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;&lt;strong&gt;Date&lt;/strong&gt;: 32-bit (days unit) and 64-bit 
(milliseconds unit)&lt;/li&gt;
-  &lt;li&gt;&lt;strong&gt;Time&lt;/strong&gt;: 64-bit integer with unit 
(second, millisecond, microsecond, nanosecond)&lt;/li&gt;
-  &lt;li&gt;&lt;strong&gt;Timestamp&lt;/strong&gt;: 64-bit integer with unit, 
with or without timezone&lt;/li&gt;
-  &lt;li&gt;&lt;strong&gt;Fixed Size Binary&lt;/strong&gt;: Primitive values 
occupying certain number of bytes&lt;/li&gt;
-  &lt;li&gt;&lt;strong&gt;Fixed Size List&lt;/strong&gt;: List values with 
constant size (no separate offsets vector)&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;We have additionally added experimental support for exact decimals in 
C++ using
-&lt;a 
href=&quot;https://github.com/boostorg/multiprecision&quot;&gt;Boost.Multiprecision&lt;/a&gt;,
 though we have not yet hardened the Decimal memory
-format between the Java and C++ implementations.&lt;/p&gt;
-
-&lt;h3 id=&quot;c-and-python-support-on-windows&quot;&gt;C++ and Python 
Support on Windows&lt;/h3&gt;
-
-&lt;p&gt;We have made many general improvements to development and packaging 
for general
-C++ and Python development. 0.3.0 is the first release to bring full C++ and
-Python support for Windows on Visual Studio (MSVC) 2015 and 2017. In addition
-to adding Appveyor continuous integration for MSVC, we have also written guides
-for building from source on Windows: &lt;a 
href=&quot;https://github.com/apache/arrow/blob/master/cpp/apidoc/Windows.md&quot;&gt;C++&lt;/a&gt;
 and &lt;a 
href=&quot;https://github.com/apache/arrow/blob/master/python/doc/source/development.rst&quot;&gt;Python&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;For the first time, you can install the Arrow Python library on 
Windows from
-&lt;a 
href=&quot;https://conda-forge.github.io&quot;&gt;conda-forge&lt;/a&gt;:&lt;/p&gt;
-
-&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;conda install pyarrow -c conda-forge
-&lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;
-
-&lt;h3 id=&quot;c-glib-bindings-with-support-for-ruby-lua-and-more&quot;&gt;C 
(GLib) Bindings, with support for Ruby, Lua, and more&lt;/h3&gt;
-
-&lt;p&gt;&lt;a href=&quot;http://github.com/kou&quot;&gt;Kouhei 
Sutou&lt;/a&gt; is a new Apache Arrow contributor and has contributed GLib C
-bindings (to the C++ libraries) for Linux. Using a C middleware framework
-called &lt;a 
href=&quot;https://wiki.gnome.org/Projects/GObjectIntrospection&quot;&gt;GObject
 Introspection&lt;/a&gt;, it is possible to use these bindings
-seamlessly in Ruby, Lua, Go, and &lt;a 
href=&quot;https://wiki.gnome.org/Projects/GObjectIntrospection/Users&quot;&gt;other
 programming languages&lt;/a&gt;. We will
-probably need to publish some follow up blogs explaining how these bindings
-work and how to use them.&lt;/p&gt;
-
-&lt;h3 id=&quot;apache-spark-integration-for-pyspark&quot;&gt;Apache Spark 
Integration for PySpark&lt;/h3&gt;
-
-&lt;p&gt;We have been collaborating with the Apache Spark community on &lt;a 
href=&quot;https://issues.apache.org/jira/browse/SPARK-13534&quot;&gt;SPARK-13534&lt;/a&gt;
-to add support for using Arrow to accelerate &lt;code 
class=&quot;highlighter-rouge&quot;&gt;DataFrame.toPandas&lt;/code&gt; in
-PySpark. We have observed over &lt;a 
href=&quot;https://github.com/apache/spark/pull/15821#issuecomment-282175163&quot;&gt;&lt;strong&gt;40x
 speedup&lt;/strong&gt;&lt;/a&gt; from the more efficient
-data serialization.&lt;/p&gt;
-
-&lt;p&gt;Using Arrow in PySpark opens the door to many other performance 
optimizations,
-particularly around UDF evaluation (e.g. &lt;code 
class=&quot;highlighter-rouge&quot;&gt;map&lt;/code&gt; and &lt;code 
class=&quot;highlighter-rouge&quot;&gt;filter&lt;/code&gt; operations with
-Python lambda functions).&lt;/p&gt;
-
-&lt;h3 
id=&quot;new-python-feature-memory-views-feather-apache-parquet-support&quot;&gt;New
 Python Feature: Memory Views, Feather, Apache Parquet support&lt;/h3&gt;
-
-&lt;p&gt;Arrowâs Python library &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pyarrow&lt;/code&gt; is a Cython binding 
for the &lt;code class=&quot;highlighter-rouge&quot;&gt;libarrow&lt;/code&gt; 
and
-&lt;code class=&quot;highlighter-rouge&quot;&gt;libarrow_python&lt;/code&gt; 
C++ libraries, which handle inteoperability with NumPy,
-&lt;a href=&quot;http://pandas.pydata.org&quot;&gt;pandas&lt;/a&gt;, and the 
Python standard library.&lt;/p&gt;
-
-&lt;p&gt;At the heart of Arrowâs C++ libraries is the &lt;code 
class=&quot;highlighter-rouge&quot;&gt;arrow::Buffer&lt;/code&gt; object, which 
is a
-managed memory view supporting zero-copy reads and slices. &lt;a 
href=&quot;https://github.com/JeffKnupp&quot;&gt;Jeff Knupp&lt;/a&gt;
-contributed integration between Arrow buffers and the Python buffer protocol
-and memoryviews, so now code like this is possible:&lt;/p&gt;
-
-&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;n&quot;&gt;In&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span 
class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span 
class=&quot;nn&quot;&gt;pyarrow&lt;/span&gt; &lt;span 
class=&quot;kn&quot;&gt;as&lt;/span&gt; &lt;span 
class=&quot;nn&quot;&gt;pa&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;In&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;buf&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;pa&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;frombuffer&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span 
class=&quot;s&quot;&gt;'foobarbaz'&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;In&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;buf&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;Out&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;pyarrow&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;_io&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;Buffer&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;at&lt;/span&gt; &lt;span 
class=&quot;mh&quot;&gt;0x7f6c0a84b538&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;In&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;memoryview&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;Out&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;memory&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;at&lt;/span&gt; &lt;span 
class=&quot;mh&quot;&gt;0x7f6c0a8c5e88&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;In&lt;/span&gt; &lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;to_pybytes&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;()&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;Out&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span 
class=&quot;s&quot;&gt;'foobarbaz'&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;
-
-&lt;p&gt;We have significantly expanded &lt;a 
href=&quot;http://parquet.apache.org&quot;&gt;&lt;strong&gt;Apache 
Parquet&lt;/strong&gt;&lt;/a&gt; support via the C++
-Parquet implementation &lt;a 
href=&quot;https://github.com/apache/parquet-cpp&quot;&gt;parquet-cpp&lt;/a&gt;.
 This includes support for partitioned
-datasets on disk or in HDFS. We added initial Arrow-powered Parquet support 
&lt;a 
href=&quot;https://github.com/dask/dask/commit/68f9e417924a985c1f2e2a587126833c70a2e9f4&quot;&gt;in
-the Dask project&lt;/a&gt;, and look forward to more collaborations with the 
Dask
-developers on distributed processing of pandas data.&lt;/p&gt;
-
-&lt;p&gt;With Arrowâs support for pandas maturing, we were able to merge in 
the
-&lt;a 
href=&quot;https://github.com/wesm/feather&quot;&gt;&lt;strong&gt;Feather 
format&lt;/strong&gt;&lt;/a&gt; implementation, which is essentially a special 
case of
-the Arrow random access format. Weâll be continuing Feather development 
within
-the Arrow codebase. For example, Feather can now read and write with Python
-file objects using Arrowâs Python binding layer.&lt;/p&gt;
-
-&lt;p&gt;We also implemented more robust support for pandas-specific data 
types, like
-&lt;code class=&quot;highlighter-rouge&quot;&gt;DatetimeTZ&lt;/code&gt; and 
&lt;code 
class=&quot;highlighter-rouge&quot;&gt;Categorical&lt;/code&gt;.&lt;/p&gt;
-
-&lt;h3 id=&quot;support-for-tensors-and-beyond-in-c-library&quot;&gt;Support 
for Tensors and beyond in C++ Library&lt;/h3&gt;
-
-&lt;p&gt;There has been increased interest in using Apache Arrow as a tool for 
zero-copy
-shared memory management for machine learning applications. A flagship example
-is the &lt;a href=&quot;https://github.com/ray-project/ray&quot;&gt;Ray 
project&lt;/a&gt; from the UC Berkeley &lt;a 
href=&quot;https://rise.cs.berkeley.edu/&quot;&gt;RISELab&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;Machine learning deals in additional kinds of data structures beyond 
what the
-Arrow columnar format supports, like multidimensional arrays aka 
âtensorsâ. As
-such, we implemented the &lt;a 
href=&quot;http://arrow.apache.org/docs/cpp/classarrow_1_1_tensor.html&quot;&gt;&lt;code
 class=&quot;highlighter-rouge&quot;&gt;arrow::Tensor&lt;/code&gt;&lt;/a&gt; 
C++ type which can utilize the
-rest of Arrowâs zero-copy shared memory machinery (using &lt;code 
class=&quot;highlighter-rouge&quot;&gt;arrow::Buffer&lt;/code&gt; for
-managing memory lifetime). In C++ in particular, we will want to provide for
-additional data structures utilizing common IO and memory management 
tools.&lt;/p&gt;
-
-&lt;h3 id=&quot;start-of-javascript-typescript-implementation&quot;&gt;Start 
of JavaScript (TypeScript) Implementation&lt;/h3&gt;
-
-&lt;p&gt;&lt;a href=&quot;https://github.com/TheNeuralBit&quot;&gt;Brian 
Hulette&lt;/a&gt; started developing an Arrow implementation in
-&lt;a 
href=&quot;https://github.com/apache/arrow/tree/master/js&quot;&gt;TypeScript&lt;/a&gt;
 for use in NodeJS and browser-side applications. We are
-benefitting from Flatbuffersâ first class support for JavaScript.&lt;/p&gt;
-
-&lt;h3 id=&quot;improved-website-and-developer-documentation&quot;&gt;Improved 
Website and Developer Documentation&lt;/h3&gt;
-
-&lt;p&gt;Since 0.2.0 we have implemented a new website stack for publishing
-documentation and blogs based on &lt;a 
href=&quot;https://jekyllrb.com&quot;&gt;Jekyll&lt;/a&gt;. Kouhei Sutou 
developed a &lt;a 
href=&quot;https://github.com/red-data-tools/jekyll-jupyter-notebook&quot;&gt;Jekyll
-Jupyter Notebook plugin&lt;/a&gt; so that we can use Jupyter to author content 
for
-the Arrow website.&lt;/p&gt;
-
-&lt;p&gt;On the website, we have now published API documentation for the C, 
C++, Java,
-and Python subcomponents. Within these you will find easier-to-follow developer
-instructions for getting started.&lt;/p&gt;
-
-&lt;h3 id=&quot;contributors&quot;&gt;Contributors&lt;/h3&gt;
-
-&lt;p&gt;Thanks to all who contributed patches to this release.&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;$ git shortlog -sn 
apache-arrow-0.2.0..apache-arrow-0.3.0
-    119 Wes McKinney
-     55 Kouhei Sutou
-     18 Uwe L. Korn
-     17 Julien Le Dem
-      9 Phillip Cloud
-      6 Bryan Cutler
-      5 Philipp Moritz
-      5 Emilio Lahr-Vivaz
-      4 Max Risuhin
-      4 Johan Mabille
-      4 Jeff Knupp
-      3 Steven Phillips
-      3 Miki Tebeka
-      2 Leif Walsh
-      2 Jeff Reback
-      2 Brian Hulette
-      1 Tsuyoshi Ozawa
-      1 rvernica
-      1 Nong Li
-      1 Julien Lafaye
-      1 Itai Incze
-      1 Holden Karau
-      1 Deepak Majeti
-&lt;/code&gt;&lt;/pre&gt;
 &lt;/div&gt;</content><author><name>wesm</name></author></entry></feed>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/0a7dc418/install/index.html
----------------------------------------------------------------------
diff --git a/install/index.html b/install/index.html
index c225aa5..9c82739 100644
--- a/install/index.html
+++ b/install/index.html
@@ -127,8 +127,9 @@
 
 <ul>
   <li><strong>Source Release</strong>: <a 
href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz";>apache-arrow-0.8.0.tar.gz</a></li>
-  <li><strong>Verification</strong>: <a 
href="https://dist.apache.org/repos/dist/release/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz.sha512";>sha512</a>,
 <a 
href="https://dist.apache.org/repos/dist/release/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz.asc";>asc</a></li>
+  <li><strong>Verification</strong>: <a 
href="https://www.apache.org/dist/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz.sha512";>sha512</a>,
 <a 
href="https://www.apache.org/dist/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz.asc";>asc</a>
 (<a href="https://www.apache.org/dyn/closer.cgi#verify";>verification 
instructions</a>)</li>
   <li><a 
href="https://github.com/apache/arrow/releases/tag/apache-arrow-0.8.0";>Git tag 
1d689e5</a></li>
+  <li><a href="http://www.apache.org/dist/arrow/KEYS";>PGP keys for release 
signatures</a></li>
 </ul>
 
 <h3 id="java-packages">Java Packages</h3>

[1/3] arrow-site git commit: Add 0.8.0 blog posts

Reply via email to