Repository: arrow-site Updated Branches: refs/heads/asf-site 5460ea7f9 -> 29105b5e1
http://git-wip-us.apache.org/repos/asf/arrow-site/blob/29105b5e/docs/ipc.html ---------------------------------------------------------------------- diff --git a/docs/ipc.html b/docs/ipc.html index 99c6ec7..6e1da0e 100644 --- a/docs/ipc.html +++ b/docs/ipc.html @@ -105,17 +105,22 @@ --> <!--- - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. See accompanying LICENSE file. + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. --> <h1 id="interprocess-messaging--communication-ipc">Interprocess messaging / communication (IPC)</h1> @@ -129,7 +134,7 @@ <li>A length prefix indicating the metadata size</li> <li>The message metadata as a <a href="https://github.com/google]/flatbuffers">Flatbuffer</a></li> <li>Padding bytes to an 8-byte boundary</li> - <li>The message body</li> + <li>The message body, which must be a multiple of 8 bytes</li> </ul> <p>Schematically, we have:</p> @@ -141,6 +146,10 @@ </code></pre> </div> +<p>The complete serialized message must be a multiple of 8 bytes so that messages +can be relocated between streams. Otherwise the amount of padding between the +metadata and the message body could be non-deterministic.</p> + <p>The <code class="highlighter-rouge">metadata_size</code> includes the size of the flatbuffer plus padding. The <code class="highlighter-rouge">Message</code> flatbuffer includes a version number, the particular message (as a flatbuffer union), and the size of the message body:</p> @@ -259,6 +268,10 @@ the body of buffers is stored in the file footer:</p> </code></pre> </div> +<p>The <code class="highlighter-rouge">metaDataLength</code> here includes the metadata length prefix, serialized +metadata, and any additional padding bytes, and by construction must be a +multiple of 8 bytes.</p> + <p>Some notes about this</p> <ul> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/29105b5e/docs/memory_layout.html ---------------------------------------------------------------------- diff --git a/docs/memory_layout.html b/docs/memory_layout.html index b1031c3..d9d2bd9 100644 --- a/docs/memory_layout.html +++ b/docs/memory_layout.html @@ -105,17 +105,22 @@ --> <!--- - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. See accompanying LICENSE file. + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. --> <h1 id="arrow-physical-memory-layout">Arrow: Physical memory layout</h1> @@ -166,7 +171,11 @@ proprietary systems that utilize the open source components.</li> linearly in the nesting level</li> <li>Capable of representing fully-materialized and decoded / decompressed <a href="https://parquet.apache.org/documentation/latest/">Parquet</a> data</li> - <li>All contiguous memory buffers are aligned at 64-byte boundaries and padded to a multiple of 64 bytes.</li> + <li>It is required to have all the contiguous memory buffers in an IPC payload +aligned at 8-byte boundaries. In other words, each buffer must start at +an aligned 8-byte offset.</li> + <li>The general recommendation is to align the buffers at 64-byte boundary, but +this is not absolutely necessary.</li> <li>Any relative type can have null slots</li> <li>Arrays are immutable once created. Implementations can provide APIs to mutate an array, but applying mutations will require a new array data structure to @@ -217,9 +226,9 @@ via byte swapping.</p> <h2 id="alignment-and-padding">Alignment and Padding</h2> -<p>As noted above, all buffers are intended to be aligned in memory at 64 byte -boundaries and padded to a length that is a multiple of 64 bytes. The alignment -requirement follows best practices for optimized memory access:</p> +<p>As noted above, all buffers must be aligned in memory at 8-byte boundaries and padded +to a length that is a multiple of 8 bytes. The alignment requirement follows best +practices for optimized memory access:</p> <ul> <li>Elements in numeric arrays will be guaranteed to be retrieved via aligned access.</li> @@ -228,12 +237,14 @@ requirement follows best practices for optimized memory access:</p> data-structures over 64 bytes (which will be a common case for Arrow Arrays).</li> </ul> -<p>Requiring padding to a multiple of 64 bytes allows for using <a href="https://software.intel.com/en-us/node/600110">SIMD</a> instructions +<p>Recommending padding to a multiple of 64 bytes allows for using <a href="https://software.intel.com/en-us/node/600110">SIMD</a> instructions consistently in loops without additional conditional checks. -This should allow for simpler and more efficient code. +This should allow for simpler, efficient and CPU cache-friendly code. The specific padding length was chosen because it matches the largest known -SIMD instruction registers available as of April 2016 (Intel AVX-512). -Guaranteed padding can also allow certain compilers +SIMD instruction registers available as of April 2016 (Intel AVX-512). In other +words, we can load the entire 64-byte buffer into a 512-bit wide SIMD register +and get data-level parallelism on all the columnar values packed into the 64-byte +buffer. Guaranteed padding can also allow certain compilers to generate more optimized code directly (e.g. One can safely use Intelâs <code class="highlighter-rouge">-qopt-assume-safe-padding</code>).</p> @@ -312,7 +323,7 @@ does not need to be adjacent in memory to the values buffer.</p> <h3 id="example-layout-int32-array">Example Layout: Int32 Array</h3> <p>For example a primitive array of int32s:</p> -<p>[1, 2, null, 4, 8]</p> +<p>[1, null, 2, 4, 8]</p> <p>Would look like:</p> @@ -321,13 +332,13 @@ does not need to be adjacent in memory to the values buffer.</p> |Byte 0 (validity bitmap) | Bytes 1-63 | |-------------------------|-----------------------| - | 00011011 | 0 (padding) | + | 00011101 | 0 (padding) | * Value Buffer: |Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-19 | Bytes 20-63 | |------------|-------------|-------------|-------------|-------------|-------------| - | 1 | 2 | unspecified | 4 | 8 | unspecified | + | 1 | unspecified | 2 | 4 | 8 | unspecified | </code></pre> </div> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/29105b5e/docs/metadata.html ---------------------------------------------------------------------- diff --git a/docs/metadata.html b/docs/metadata.html index fb18e4a..24147ad 100644 --- a/docs/metadata.html +++ b/docs/metadata.html @@ -105,17 +105,22 @@ --> <!--- - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. See accompanying LICENSE file. + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. --> <h1 id="metadata-logical-types-schemas-data-headers">Metadata: Logical types, schemas, data headers</h1> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/29105b5e/feed.xml ---------------------------------------------------------------------- diff --git a/feed.xml b/feed.xml index 6020d07..dc62590 100644 --- a/feed.xml +++ b/feed.xml @@ -1,4 +1,163 @@ -<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2017-08-31T11:43:33-04:00</updated><id>/</id><entry><title type="html">Apache Arrow 0.6.0 Release</title><link href="/blog/2017/08/16/0.6.0-release/" rel="alternate" type="text/html" title="Apache Arrow 0.6.0 Release" /><published>2017-08-16T00:00:00-04:00</published><updated>2017-08-16T00:00:00-04:00</updated><id>/blog/2017/08/16/0.6.0-release</id><content type="html" xml:base="/blog/2017/08/16/0.6.0-release/"><!-- +<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2017-09-19T09:11:04-04:00</updated><id>/</id><entry><title type="html">Apache Arrow 0.7.0 Release</title><link href="/blog/2017/09/19/0.7.0-release/" rel="alternate" type="text/html" title="Apache Arrow 0.7.0 Release" /><published>2017-09-19T00:00:00-04:00</published><updated>2017-09-19T00:00:00-04:00</updated><id>/blog/2017/09/19/0.7.0-release</id><content type="html" xml:base="/blog/2017/09/19/0.7.0-release/"><!-- + +--> + +<p>The Apache Arrow team is pleased to announce the 0.7.0 release. It includes +<a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.7.0"><strong>133 resolved JIRAs</strong></a> many new features and bug fixes to the various +language implementations. The Arrow memory format remains stable since the +0.3.x release.</p> + +<p>See the <a href="http://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your +platform. The <a href="http://arrow.apache.org/release/0.7.0.html">complete changelog</a> is also available.</p> + +<p>We include some highlights from the release in this post.</p> + +<h2 id="new-pmc-member-kouhei-sutou">New PMC Member: Kouhei Sutou</h2> + +<p>Since the last release we have added <a href="https://github.com/kou">Kou</a> to the Arrow Project Management +Committee. He is also a PMC for Apache Subversion, and a major contributor to +many other open source projects.</p> + +<p>As an active member of the Ruby community in Japan, Kou has been developing the +GLib-based C bindings for Arrow with associated Ruby wrappers, to enable Ruby +users to benefit from the work thatâs happening in Apache Arrow.</p> + +<p>We are excited to be collaborating with the Ruby community on shared +infrastructure for in-memory analytics and data science.</p> + +<h2 id="expanded-javascript-typescript-implementation">Expanded JavaScript (TypeScript) Implementation</h2> + +<p><a href="https://github.com/trxcllnt">Paul Taylor</a> from the <a href="https://github.com/netflix/falcor">Falcor</a> and <a href="http://reactivex.io">ReactiveX</a> projects has worked to +expand the JavaScript implementation (which is written in TypeScript), using +the latest in modern JavaScript build and packaging technology. We are looking +forward to building out the JS implementation and bringing it up to full +functionality with the C++ and Java implementations.</p> + +<p>We are looking for more JavaScript developers to join the project and work +together to make Arrow for JS work well with many kinds of front end use cases, +like real time data visualization.</p> + +<h2 id="type-casting-for-c-and-python">Type casting for C++ and Python</h2> + +<p>As part of longer-term efforts to build an Arrow-native in-memory analytics +library, we implemented a variety of type conversion functions. These functions +are essential in ETL tasks when conforming one table schema to another. These +are similar to the <code class="highlighter-rouge">astype</code> function in NumPy.</p> + +<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">In</span> <span class="p">[</span><span class="mi">17</span><span class="p">]:</span> <span class="kn">import</span> <span class="nn">pyarrow</span> <span class="kn">as</span> <span class="nn">pa</span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">18</span><span class="p">]:</span> <span class="n">arr</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="bp">True</span><span class="p">,</span> <span class="bp">False</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">True</span><span class="p">])</span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">19</span><span class="p">]:</span> <span class="n">arr</span> +<span class="n">Out</span><span class="p">[</span><span class="mi">19</span><span class="p">]:</span> +<span class="o">&lt;</span><span class="n">pyarrow</span><span class="o">.</span><span class="n">lib</span><span class="o">.</span><span class="n">BooleanArray</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x7ff6fb069b88</span><span class="o">&gt;</span> +<span class="p">[</span> + <span class="bp">True</span><span class="p">,</span> + <span class="bp">False</span><span class="p">,</span> + <span class="n">NA</span><span class="p">,</span> + <span class="bp">True</span> +<span class="p">]</span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">20</span><span class="p">]:</span> <span class="n">arr</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">pa</span><span class="o">.</span><span class="n">int32</span><span class="p">())</span> +<span class="n">Out</span><span class="p">[</span><span class="mi">20</span><span class="p">]:</span> +<span class="o">&lt;</span><span class="n">pyarrow</span><span class="o">.</span><span class="n">lib</span><span class="o">.</span><span class="n">Int32Array</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x7ff6fb0383b8</span><span class="o">&gt;</span> +<span class="p">[</span> + <span class="mi">1</span><span class="p">,</span> + <span class="mi">0</span><span class="p">,</span> + <span class="n">NA</span><span class="p">,</span> + <span class="mi">1</span> +<span class="p">]</span> +</code></pre> +</div> + +<p>Over time these will expand to support as many input-and-output type +combinations with optimized conversions.</p> + +<h2 id="new-arrow-gpu-cuda-extension-library-for-c">New Arrow GPU (CUDA) Extension Library for C++</h2> + +<p>To help with GPU-related projects using Arrow, like the <a href="http://gpuopenanalytics.com/">GPU Open Analytics +Initiative</a>, we have started a C++ add-on library to simplify Arrow memory +management on CUDA-enabled graphics cards. We would like to expand this to +include a library of reusable CUDA kernel functions for GPU analytics on Arrow +columnar memory.</p> + +<p>For example, we could write a record batch from CPU memory to GPU device memory +like so (some error checking omitted):</p> + +<div class="language-c++ highlighter-rouge"><pre class="highlight"><code><span class="cp">#include &lt;arrow/api.h&gt; +#include &lt;arrow/gpu/cuda_api.h&gt; +</span> +<span class="k">using</span> <span class="k">namespace</span> <span class="n">arrow</span><span class="p">;</span> + +<span class="n">gpu</span><span class="o">::</span><span class="n">CudaDeviceManager</span><span class="o">*</span> <span class="n">manager</span><span class="p">;</span> +<span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">gpu</span><span class="o">::</span><span class="n">CudaContext</span><span class="o">&gt;</span> <span class="n">context</span><span class="p">;</span> + +<span class="n">gpu</span><span class="o">::</span><span class="n">CudaDeviceManager</span><span class="o">::</span><span class="n">GetInstance</span><span class="p">(</span><span class="o">&amp;</span><span class="n">manager</span><span class="p">)</span> +<span class="n">manager_</span><span class="o">-&gt;</span><span class="n">GetContext</span><span class="p">(</span><span class="n">kGpuNumber</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">context</span><span class="p">);</span> + +<span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">RecordBatch</span><span class="o">&gt;</span> <span class="n">batch</span> <span class="o">=</span> <span class="n">GetCpuData</span><span class="p">();</span> + +<span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">gpu</span><span class="o">::</span><span class="n">CudaBuffer</span><span class="o">&gt;</span> <span class="n">device_serialized</span><span class="p">;</span> +<span class="n">gpu</span><span class="o">::</span><span class="n">SerializeRecordBatch</span><span class="p">(</span><span class="o">*</span><span class="n">batch</span><span class="p">,</span> <span class="n">context_</span><span class="p">.</span><span class="n">get</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">device_serialized</span><span class="p">));</span> +</code></pre> +</div> + +<p>We can then âreadâ the GPU record batch, but the returned <code class="highlighter-rouge">arrow::RecordBatch</code> +internally will contain GPU device pointers that you can use for CUDA kernel +calls:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>std::shared_ptr&lt;RecordBatch&gt; device_batch; +gpu::ReadRecordBatch(batch-&gt;schema(), device_serialized, + default_memory_pool(), &amp;device_batch)); + +// Now run some CUDA kernels on device_batch +</code></pre> +</div> + +<h2 id="decimal-integration-tests">Decimal Integration Tests</h2> + +<p><a href="http://github.com/cpcloud">Phillip Cloud</a> has been working on decimal support in C++ to enable Parquet +read/write support in C++ and Python, and also end-to-end testing against the +Arrow Java libraries.</p> + +<p>In the upcoming releases, we hope to complete the remaining data types that +need end-to-end testing between Java and C++:</p> + +<ul> + <li>Fixed size lists (variable-size lists already implemented)</li> + <li>Fixes size binary</li> + <li>Unions</li> + <li>Maps</li> + <li>Time intervals</li> +</ul> + +<h2 id="other-notable-python-changes">Other Notable Python Changes</h2> + +<p>Some highlights of Python development outside of bug fixes and general API +improvements include:</p> + +<ul> + <li>Simplified <code class="highlighter-rouge">put</code> and <code class="highlighter-rouge">get</code> arbitrary Python objects in Plasma objects</li> + <li><a href="http://arrow.apache.org/docs/python/ipc.html">High-speed, memory efficient object serialization</a>. This is important +enough that we will likely write a dedicated blog post about it.</li> + <li>New <code class="highlighter-rouge">flavor='spark'</code> option to <code class="highlighter-rouge">pyarrow.parquet.write_table</code> to enable easy +writing of Parquet files maximized for Spark compatibility</li> + <li><code class="highlighter-rouge">parquet.write_to_dataset</code> function with support for partitioned writes</li> + <li>Improved support for Dask filesystems</li> + <li>Improved Python usability for IPC: read and write schemas and record batches +more easily. See the <a href="http://arrow.apache.org/docs/python/api.html">API docs</a> for more about these.</li> +</ul> + +<h2 id="the-road-ahead">The Road Ahead</h2> + +<p>Upcoming Arrow releases will continue to expand the project to cover more use +cases. In addition to completing end-to-end testing for all the major data +types, some of us will be shifting attention to building Arrow-native in-memory +analytics libraries.</p> + +<p>We are looking for more JavaScript, R, and other programming language +developers to join the project and expand the available implementations and +bindings to more languages.</p></content><author><name>wesm</name></author></entry><entry><title type="html">Apache Arrow 0.6.0 Release</title><link href="/blog/2017/08/16/0.6.0-release/" rel="alternate" type="text/html" title="Apache Arrow 0.6.0 Release" /><published>2017-08-16T00:00:00-04:00</published><updated>2017-08-16T00:00:00-04:00</updated><id>/blog/2017/08/16/0.6.0-release</id><content type="html" xml:base="/blog/2017/08/16/0.6.0-release/"><!-- --> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/29105b5e/index.html ---------------------------------------------------------------------- diff --git a/index.html b/index.html index d9235a0..ad76caf 100644 --- a/index.html +++ b/index.html @@ -106,26 +106,37 @@ <p class="lead">Powering Columnar In-Memory Analytics</p> <p> <a class="btn btn-lg btn-success" href="mailto:dev-subscr...@arrow.apache.org" role="button">Join Mailing List</a> - <a class="btn btn-lg btn-primary" href="/install/" role="button">Install (0.6.0 Release - August 14, 2017)</a> + <a class="btn btn-lg btn-primary" href="/install/" role="button">Install (0.7.0 Release - September 17, 2017)</a> </p> </div> - <h4><strong>Latest News</strong>: <a href="/blog/">Apache Arrow 0.6.0 release</a></h4> + <h4><strong>Latest News</strong>: <a href="/blog/">Apache Arrow 0.7.0 release</a></h4> <div class="row"> <div class="col-lg-4"> <h2>Fast</h2> - <p>Apache Arrow™ enables execution engines to take advantage of the latest SIM -D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format - as possible.</p> + <p>Apache Arrow™ enables execution engines to take advantage of + the latest SIMD (Single input multiple data) operations included in modern + processors, for native vectorized optimization of analytical data + processing. Columnar layout is optimized for data locality for better + performance on modern hardware like CPUs and GPUs.</p> + <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p> + </div> <div class="col-lg-4"> <h2>Flexible</h2> - <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python, Ruby, and JavaScript implementations are in progress and more languages are welcome.</p> + <p>Arrow acts as a new high-performance interface between various + systems. It is also focused on supporting a wide variety of + industry-standard programming languages. Java, C, C++, Python, Ruby, + and JavaScript implementations are in progress and more languages are + welcome.</p> </div> <div class="col-lg-4"> <h2>Standard</h2> - <p>Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p> + <p>Apache Arrow is backed by key developers of 13 major open source + projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, + Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it + the de-facto standard for columnar in-memory analytics.</p> </div> </div> <!-- close "row" div --> @@ -140,7 +151,7 @@ D (Single input multiple data) operations included in modern processors, for nat <img src="img/copy2.png" alt="common data layer" style="width:100%" /> <ul> <li>Each system has its own internal memory format</li> - <li>70-80% CPU wasted on serialization and deserialization</li> + <li>70-80% computation wasted on serialization and deserialization</li> <li>Similar functionality implemented in multiple projects</li> </ul> </div> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/29105b5e/install/index.html ---------------------------------------------------------------------- diff --git a/install/index.html b/install/index.html index 2b11b73..2b894af 100644 --- a/install/index.html +++ b/install/index.html @@ -104,23 +104,23 @@ --> -<h2 id="current-version-060">Current Version: 0.6.0</h2> +<h2 id="current-version-070">Current Version: 0.7.0</h2> -<h3 id="released-14-august-2017">Released: 14 August 2017</h3> +<h3 id="released-17-september-2017">Released: 17 September 2017</h3> -<p>See the <a href="http://arrow.apache.org/release/0.6.0.html">release notes</a> for more about whatâs new.</p> +<p>See the <a href="http://arrow.apache.org/release/0.7.0.html">release notes</a> for more about whatâs new.</p> <h3 id="source-release">Source release</h3> <ul> - <li><strong>Source Release</strong>: <a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz">apache-arrow-0.6.0.tar.gz</a></li> - <li><strong>Verification</strong>: <a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz.md5">md5</a>, <a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz.asc">asc</a></li> - <li><a href="https://github.com/apache/arrow/releases/tag/apache-arrow-0.6.0">Git tag b173334</a></li> + <li><strong>Source Release</strong>: <a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz">apache-arrow-0.7.0.tar.gz</a></li> + <li><strong>Verification</strong>: <a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz.sha512">sha512</a>, <a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz.asc">asc</a></li> + <li><a href="https://github.com/apache/arrow/releases/tag/apache-arrow-0.7.0">Git tag 97f9029</a></li> </ul> <h3 id="java-packages">Java Packages</h3> -<p><a href="http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.arrow%22%20AND%20v%3A%220.6.0%22">Java Artifacts on Maven Central</a></p> +<p><a href="http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.arrow%22%20AND%20v%3A%220.7.0%22">Java Artifacts on Maven Central</a></p> <h2 id="binary-installers-for-c-c-python">Binary Installers for C, C++, Python</h2> @@ -138,8 +138,8 @@ platforms:</p> <p>Install them with:</p> -<div class="language-shell highlighter-rouge"><pre class="highlight"><code>conda install arrow-cpp<span class="o">=</span>0.6.<span class="k">*</span> -c conda-forge -conda install <span class="nv">pyarrow</span><span class="o">==</span>0.6.<span class="k">*</span> -c conda-forge +<div class="language-shell highlighter-rouge"><pre class="highlight"><code>conda install arrow-cpp<span class="o">=</span>0.7.<span class="k">*</span> -c conda-forge +conda install <span class="nv">pyarrow</span><span class="o">==</span>0.7.<span class="k">*</span> -c conda-forge </code></pre> </div> @@ -147,11 +147,11 @@ conda install <span class="nv">pyarrow</span><span class="o">==</span>0.6.<span <p>We have provided binary wheels on PyPI for Linux, macOS, and Windows:</p> -<div class="language-shell highlighter-rouge"><pre class="highlight"><code>pip install <span class="nv">pyarrow</span><span class="o">==</span>0.6.<span class="k">*</span> +<div class="language-shell highlighter-rouge"><pre class="highlight"><code>pip install <span class="nv">pyarrow</span><span class="o">==</span>0.7.<span class="k">*</span> </code></pre> </div> -<p>We recommend pinning <code class="highlighter-rouge">0.6.*</code> in <code class="highlighter-rouge">requirements.txt</code> to install the latest patch +<p>We recommend pinning <code class="highlighter-rouge">0.7.*</code> in <code class="highlighter-rouge">requirements.txt</code> to install the latest patch release.</p> <p>These include the Apache Arrow and Apache Parquet C++ binary libraries bundled @@ -225,6 +225,21 @@ sudo yum install -y --enablerepo<span class="o">=</span>epel parquet-glib-devel <a href="https://github.com/red-data-tools/arrow-packages">red-data-tools/arrow-packages</a>. If you have any feedback, please send it to the project instead of Apache Arrow project.</p> +<h3 id="nightly-development-builds">Nightly Development Builds</h3> + +<p>To assist with development and debugging, some nightly builds are +available. These builds are not releases and not necessarily produced on ASF +infrastructure. They are to be used strictly for development.</p> + +<ul> + <li><strong>conda packages</strong> for C++ and Python (Linux only)</li> +</ul> + +<div class="highlighter-rouge"><pre class="highlight"><code>conda install arrow-cpp -c twosigma +conda install pyarrow -c twosigma +</code></pre> +</div> + <hr/> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/29105b5e/release/index.html ---------------------------------------------------------------------- diff --git a/release/index.html b/release/index.html index 5904a85..5279b3f 100644 --- a/release/index.html +++ b/release/index.html @@ -109,6 +109,7 @@ <p>Navigate to the release page for downloads and the changelog.</p> <ul> + <li><a href="/release/0.7.0.html">0.7.0 (17 September 2017)</a></li> <li><a href="/release/0.6.0.html">0.6.0 (14 August 2017)</a></li> <li><a href="/release/0.5.0.html">0.5.0 (23 July 2017)</a></li> <li><a href="/release/0.4.1.html">0.4.1 (9 June 2017)</a></li>