Update site for 0.9.0
Project: http://git-wip-us.apache.org/repos/asf/arrow-site/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow-site/commit/74ed9477 Tree: http://git-wip-us.apache.org/repos/asf/arrow-site/tree/74ed9477 Diff: http://git-wip-us.apache.org/repos/asf/arrow-site/diff/74ed9477 Branch: refs/heads/asf-site Commit: 74ed94774f7aea92755cf9f0be6f0b1402018fc7 Parents: d37db9d Author: Wes McKinney <[email protected]> Authored: Thu Mar 22 09:01:44 2018 -0400 Committer: Wes McKinney <[email protected]> Committed: Thu Mar 22 09:01:44 2018 -0400 ---------------------------------------------------------------------- blog/2017/12/18/0.8.0-release/index.html | 2 +- .../12/19/java-vector-improvements/index.html | 1 + blog/2018/03/22/0.9.0-release/index.html | 222 ++++++++++ blog/2018/03/22/go-code-donation/index.html | 199 +++++++++ blog/index.html | 183 +++++++- docs/ipc.html | 47 +- docs/memory_layout.html | 11 +- docs/metadata.html | 3 +- feed.xml | 237 +++++----- img/native_go_implementation.png | Bin 0 -> 56186 bytes index.html | 2 +- install/index.html | 22 +- powered_by/index.html | 1 + release/0.9.0.html | 444 +++++++++++++++++++ release/index.html | 1 + 15 files changed, 1190 insertions(+), 185 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/blog/2017/12/18/0.8.0-release/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/12/18/0.8.0-release/index.html b/blog/2017/12/18/0.8.0-release/index.html index 2ca46fc..7ab1ec8 100644 --- a/blog/2017/12/18/0.8.0-release/index.html +++ b/blog/2017/12/18/0.8.0-release/index.html @@ -143,7 +143,7 @@ --> <p>The Apache Arrow team is pleased to announce the 0.8.0 release. It is the -product of 10 weeks of development andincludes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.8.0"><strong>286 resolved JIRAs</strong></a> with +product of 10 weeks of development and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.8.0"><strong>286 resolved JIRAs</strong></a> with many new features and bug fixes to the various language implementations. This is the largest release since 0.3.0 earlier this year.</p> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/blog/2017/12/19/java-vector-improvements/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/12/19/java-vector-improvements/index.html b/blog/2017/12/19/java-vector-improvements/index.html index 93ccfbd..2c54256 100644 --- a/blog/2017/12/19/java-vector-improvements/index.html +++ b/blog/2017/12/19/java-vector-improvements/index.html @@ -89,6 +89,7 @@ <li><a href="/docs/cpp">C++ API</a></li> <li><a href="/docs/java">Java API</a></li> <li><a href="/docs/c_glib">C GLib API</a></li> + <li><a href="/docs/js">Javascript API</a></li> </ul> </li> <!-- <li><a href="/blog">Blog</a></li> --> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/blog/2018/03/22/0.9.0-release/index.html ---------------------------------------------------------------------- diff --git a/blog/2018/03/22/0.9.0-release/index.html b/blog/2018/03/22/0.9.0-release/index.html new file mode 100644 index 0000000..b9f7815 --- /dev/null +++ b/blog/2018/03/22/0.9.0-release/index.html @@ -0,0 +1,222 @@ +<!DOCTYPE html> +<html lang="en-US"> + <head> + <meta charset="UTF-8"> + <title>Apache Arrow Homepage</title> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="generator" content="Jekyll v3.4.3"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + + <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> + + <link href="/css/main.css" rel="stylesheet"> + <link href="/css/syntax.css" rel="stylesheet"> + <script src="https://code.jquery.com/jquery-3.2.1.min.js" + integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" + crossorigin="anonymous"></script> + <script src="/assets/javascripts/bootstrap.min.js"></script> + + <!-- Global Site Tag (gtag.js) - Google Analytics --> +<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script> +<script> + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments)}; + gtag('js', new Date()); + + gtag('config', 'UA-107500873-1'); +</script> + + + </head> + + + +<body class="wrap"> + <div class="container"> + <nav class="navbar navbar-default"> + <div class="container-fluid"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/">Apache Arrow™ </a> + </div> + + <!-- Collect the nav links, forms, and other content for toggling --> + <div class="collapse navbar-collapse" id="arrow-navbar"> + <ul class="nav navbar-nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Project Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/install/">Install</a></li> + <li><a href="/blog/">Blog</a></li> + <li><a href="/release/">Releases</a></li> + <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li> + <li><a href="https://github.com/apache/arrow">Source Code</a></li> + <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> + <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> + <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Specification<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/memory_layout.html">Memory Layout</a></li> + <li><a href="/docs/metadata.html">Metadata</a></li> + <li><a href="/docs/ipc.html">Messaging / IPC</a></li> + </ul> + </li> + + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Documentation<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/python">Python</a></li> + <li><a href="/docs/cpp">C++ API</a></li> + <li><a href="/docs/java">Java API</a></li> + <li><a href="/docs/c_glib">C GLib API</a></li> + <li><a href="/docs/js">Javascript API</a></li> + </ul> + </li> + <!-- <li><a href="/blog">Blog</a></li> --> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">ASF Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="http://www.apache.org/">ASF Website</a></li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </li> + </ul> + <a href="http://www.apache.org/"> + <img style="float:right;" src="/img/asf_logo.svg" width="120px"/> + </a> + </div><!-- /.navbar-collapse --> + </div> + </nav> + + + <h2> + Apache Arrow 0.9.0 Release + <a href="/blog/2018/03/22/0.9.0-release/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 22 Mar 2018 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://wesmckinney.com"><i class="fa fa-user"></i> Wes McKinney (wesm)</a> + </div> + </div> + </div> + + <!-- + +--> + +<p>The Apache Arrow team is pleased to announce the 0.9.0 release. It is the +product of over 3 months of development and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.9.0"><strong>260 resolved +JIRAs</strong></a>.</p> + +<p>While we made some of backwards-incompatible columnar binary format changes in +last Decemberâs 0.8.0 release, the 0.9.0 release is backwards-compatible with +0.8.0. We will be working toward a 1.0.0 release this year, which will mark +longer-term binary stability for the Arrow columnar format and metadata.</p> + +<p>See the <a href="https://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your +platform. The <a href="https://arrow.apache.org/release/0.8.0.html">complete changelog</a> is also available.</p> + +<p>We discuss some highlights from the release and other project news in this +post. This release has been overall focused more on bug fixes, compatibility, +and stability compared with previous releases which have pushed more on new and +expanded features.</p> + +<h2 id="new-arrow-committers-and-pmc-members">New Arrow committers and PMC members</h2> + +<p>Since the last release, we have added 2 new Arrow committers: <a href="https://github.com/theneuralbit">Brian +Hulette</a> and <a href="https://github.com/robertnishihara">Robert Nishihara</a>. Additionally, <a href="https://github.com/cpcloud">Phillip Cloud</a> and +<a href="https://github.com/pcmoritz">Philipp Moritz</a> have been promoted from committer to PMC +member. Congratulations and thank you for your contributions!</p> + +<h2 id="plasma-object-store-improvements">Plasma Object Store Improvements</h2> + +<p>The Plasma Object Store now supports managing interprocess shared memory on +CUDA-enabled GPUs. We are excited to see more GPU-related functionality develop +in Apache Arrow, as this has become a key computing environment for scalable +machine learning.</p> + +<h2 id="python-improvements">Python Improvements</h2> + +<p><a href="https://github.com/pitrou">Antoine Pitrou</a> has joined the Python development efforts and helped +significantly this release with interoperability with built-in CPython data +structures and NumPy structured data types.</p> + +<ul> + <li>New experimental support for reading Apache ORC files</li> + <li><code class="highlighter-rouge">pyarrow.array</code> now accepts lists of tuples or Python dicts for creating +Arrow struct type arrays.</li> + <li>NumPy structured dtypes (which are row/record-oriented) can be directly +converted to Arrow struct (column-oriented) arrays</li> + <li>Python 3.6 <code class="highlighter-rouge">pathlib</code> objects for file paths are now accepted in many file +APIs, including for Parquet files</li> + <li>Arrow integer arrays with nulls can now be converted to NumPy object arrays +with <code class="highlighter-rouge">None</code> values</li> + <li>New <code class="highlighter-rouge">pyarrow.foreign_buffer</code> API for interacting with memory blocks located +at particular memory addresses</li> +</ul> + +<h2 id="java-improvements">Java Improvements</h2> + +<p>Java now fully supports the <code class="highlighter-rouge">FixedSizeBinary</code> data type.</p> + +<h2 id="javascript-improvements">JavaScript Improvements</h2> + +<p>The JavaScript library has been significantly refactored and expanded. We are +making separate Apache releases (most recently <code class="highlighter-rouge">JS-0.3.1</code>) for JavaScript, +which are being <a href="https://www.npmjs.com/package/apache-arrow">published to NPM</a>.</p> + +<h2 id="upcoming-roadmap">Upcoming Roadmap</h2> + +<p>In the coming months, we will be working to move Apache Arrow closer to a 1.0.0 +release. We will also be discussing plans to develop native Arrow-based +computational libraries within the project.</p> + + + + <hr/> +<footer class="footer"> + <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> + <p>© 2017 Apache Software Foundation</p> +</footer> + + </div> +</body> +</html> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/blog/2018/03/22/go-code-donation/index.html ---------------------------------------------------------------------- diff --git a/blog/2018/03/22/go-code-donation/index.html b/blog/2018/03/22/go-code-donation/index.html new file mode 100644 index 0000000..c20af9c --- /dev/null +++ b/blog/2018/03/22/go-code-donation/index.html @@ -0,0 +1,199 @@ +<!DOCTYPE html> +<html lang="en-US"> + <head> + <meta charset="UTF-8"> + <title>Apache Arrow Homepage</title> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="generator" content="Jekyll v3.4.3"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + + <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> + + <link href="/css/main.css" rel="stylesheet"> + <link href="/css/syntax.css" rel="stylesheet"> + <script src="https://code.jquery.com/jquery-3.2.1.min.js" + integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" + crossorigin="anonymous"></script> + <script src="/assets/javascripts/bootstrap.min.js"></script> + + <!-- Global Site Tag (gtag.js) - Google Analytics --> +<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script> +<script> + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments)}; + gtag('js', new Date()); + + gtag('config', 'UA-107500873-1'); +</script> + + + </head> + + + +<body class="wrap"> + <div class="container"> + <nav class="navbar navbar-default"> + <div class="container-fluid"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/">Apache Arrow™ </a> + </div> + + <!-- Collect the nav links, forms, and other content for toggling --> + <div class="collapse navbar-collapse" id="arrow-navbar"> + <ul class="nav navbar-nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Project Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/install/">Install</a></li> + <li><a href="/blog/">Blog</a></li> + <li><a href="/release/">Releases</a></li> + <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li> + <li><a href="https://github.com/apache/arrow">Source Code</a></li> + <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> + <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> + <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Specification<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/memory_layout.html">Memory Layout</a></li> + <li><a href="/docs/metadata.html">Metadata</a></li> + <li><a href="/docs/ipc.html">Messaging / IPC</a></li> + </ul> + </li> + + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Documentation<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/python">Python</a></li> + <li><a href="/docs/cpp">C++ API</a></li> + <li><a href="/docs/java">Java API</a></li> + <li><a href="/docs/c_glib">C GLib API</a></li> + <li><a href="/docs/js">Javascript API</a></li> + </ul> + </li> + <!-- <li><a href="/blog">Blog</a></li> --> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">ASF Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="http://www.apache.org/">ASF Website</a></li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </li> + </ul> + <a href="http://www.apache.org/"> + <img style="float:right;" src="/img/asf_logo.svg" width="120px"/> + </a> + </div><!-- /.navbar-collapse --> + </div> + </nav> + + + <h2> + A Native Go Library for Apache Arrow + <a href="/blog/2018/03/22/go-code-donation/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 22 Mar 2018 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://github.com/pmc"><i class="fa fa-user"></i> The Apache Arrow PMC (pmc)</a> + </div> + </div> + </div> + + <!-- + +--> + +<p>Since launching in early 2016, Apache Arrow has been growing fast. We have made +nine major releases through the efforts of over 120 distinct contributors. The +projectâs scope has also expanded. We began by focusing on the development of +the standardized in-memory columnar data format, which now serves as a pillar +of the project. Since then, we have been growing into a more general +cross-language platform for in-memory data analysis through new additions to +the project like the <a href="http://arrow.apache.org/blog/2017/08/16/0.6.0-release/">Plasma shared memory object store</a>. A primary goal of +the project is to enable data system developers to process and move data fast.</p> + +<p>So far, we officially have developed native Arrow implementations in C++, Java, +and JavaScript. We have created binding layers for the C++ libraries in C +(using the GLib libraries) and Python. We have also seen efforts to develop +interfaces to the Arrow C++ libraries in Go, Lua, Ruby, and Rust. While binding +layers serve many purposes, there can be benefits to native implementations, +and so weâve been keen to see future work on native implementations in growing +systems languages like Go and Rust.</p> + +<p>This past October, engineers <a href="https://github.com/stuartcarnie">Stuart Carnie</a>, <a href="https://github.com/nathanielc">Nathaniel Cook</a>, and +<a href="https://github.com/goller">Chris Goller</a>, employees of <a href="https://influxdata.com">InfluxData</a>, began developing a native [Go +language implementation of the <a href="https://github.com/influxdata/arrow">Apache Arrow</a> in-memory columnar format for +use in Go-based database systems like InfluxDB. We are excited to announce that +InfluxData has donated this native Go implementation to the Apache Arrow +project, where it will continue to be developed. This work features low-level +integration with the Go runtime and native support for SIMD instruction +sets. We are looking forward to working more closely with the Go community on +solving in-memory analytics and data interoperability problems.</p> + +<div align="center"> +<img src="/img/native_go_implementation.png" alt="Apache Arrow implementations and bindings" width="60%" class="img-responsive" /> +</div> + +<p>One of the mantras in <a href="https://www.apache.org">The Apache Software Foundation</a> is âCommunity over +Codeâ. By building an open and collaborative development community across many +programming language ecosystems, we will be able to development better and +longer-lived solutions to the systems problems faced by data developers.</p> + +<p>We are excited for what the future holds for the Apache Arrow project. Adding +first-class support for a popular systems programming language like Go is an +important step along the way. We welcome others from the Go community to get +involved in the project. We also welcome others who wish to explore building +Arrow support for other programming languages not yet represented. Learn more +at <a href="https://arrow.apache.org">https://arrow.apache.org</a> and join the mailing list +<a href="https://lists.apache.org/[email protected]">[email protected]</a>.</p> + + + + <hr/> +<footer class="footer"> + <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> + <p>© 2017 Apache Software Foundation</p> +</footer> + + </div> +</body> +</html> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/blog/index.html ---------------------------------------------------------------------- diff --git a/blog/index.html b/blog/index.html index c70cc40..ac80a96 100644 --- a/blog/index.html +++ b/blog/index.html @@ -124,6 +124,185 @@ <div class="container"> <h2> + A Native Go Library for Apache Arrow + <a href="/blog/2018/03/22/go-code-donation/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 22 Mar 2018 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://github.com/pmc"><i class="fa fa-user"></i> The Apache Arrow PMC (pmc)</a> + </div> + </div> + </div> + <!-- + +--> + +<p>Since launching in early 2016, Apache Arrow has been growing fast. We have made +nine major releases through the efforts of over 120 distinct contributors. The +projectâs scope has also expanded. We began by focusing on the development of +the standardized in-memory columnar data format, which now serves as a pillar +of the project. Since then, we have been growing into a more general +cross-language platform for in-memory data analysis through new additions to +the project like the <a href="http://arrow.apache.org/blog/2017/08/16/0.6.0-release/">Plasma shared memory object store</a>. A primary goal of +the project is to enable data system developers to process and move data fast.</p> + +<p>So far, we officially have developed native Arrow implementations in C++, Java, +and JavaScript. We have created binding layers for the C++ libraries in C +(using the GLib libraries) and Python. We have also seen efforts to develop +interfaces to the Arrow C++ libraries in Go, Lua, Ruby, and Rust. While binding +layers serve many purposes, there can be benefits to native implementations, +and so weâve been keen to see future work on native implementations in growing +systems languages like Go and Rust.</p> + +<p>This past October, engineers <a href="https://github.com/stuartcarnie">Stuart Carnie</a>, <a href="https://github.com/nathanielc">Nathaniel Cook</a>, and +<a href="https://github.com/goller">Chris Goller</a>, employees of <a href="https://influxdata.com">InfluxData</a>, began developing a native [Go +language implementation of the <a href="https://github.com/influxdata/arrow">Apache Arrow</a> in-memory columnar format for +use in Go-based database systems like InfluxDB. We are excited to announce that +InfluxData has donated this native Go implementation to the Apache Arrow +project, where it will continue to be developed. This work features low-level +integration with the Go runtime and native support for SIMD instruction +sets. We are looking forward to working more closely with the Go community on +solving in-memory analytics and data interoperability problems.</p> + +<div align="center"> +<img src="/img/native_go_implementation.png" alt="Apache Arrow implementations and bindings" width="60%" class="img-responsive" /> +</div> + +<p>One of the mantras in <a href="https://www.apache.org">The Apache Software Foundation</a> is âCommunity over +Codeâ. By building an open and collaborative development community across many +programming language ecosystems, we will be able to development better and +longer-lived solutions to the systems problems faced by data developers.</p> + +<p>We are excited for what the future holds for the Apache Arrow project. Adding +first-class support for a popular systems programming language like Go is an +important step along the way. We welcome others from the Go community to get +involved in the project. We also welcome others who wish to explore building +Arrow support for other programming languages not yet represented. Learn more +at <a href="https://arrow.apache.org">https://arrow.apache.org</a> and join the mailing list +<a href="https://lists.apache.org/[email protected]">[email protected]</a>.</p> + + + </div> + + + + + + <div class="container"> + <h2> + Apache Arrow 0.9.0 Release + <a href="/blog/2018/03/22/0.9.0-release/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 22 Mar 2018 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://wesmckinney.com"><i class="fa fa-user"></i> Wes McKinney (wesm)</a> + </div> + </div> + </div> + <!-- + +--> + +<p>The Apache Arrow team is pleased to announce the 0.9.0 release. It is the +product of over 3 months of development and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.9.0"><strong>260 resolved +JIRAs</strong></a>.</p> + +<p>While we made some of backwards-incompatible columnar binary format changes in +last Decemberâs 0.8.0 release, the 0.9.0 release is backwards-compatible with +0.8.0. We will be working toward a 1.0.0 release this year, which will mark +longer-term binary stability for the Arrow columnar format and metadata.</p> + +<p>See the <a href="https://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your +platform. The <a href="https://arrow.apache.org/release/0.8.0.html">complete changelog</a> is also available.</p> + +<p>We discuss some highlights from the release and other project news in this +post. This release has been overall focused more on bug fixes, compatibility, +and stability compared with previous releases which have pushed more on new and +expanded features.</p> + +<h2 id="new-arrow-committers-and-pmc-members">New Arrow committers and PMC members</h2> + +<p>Since the last release, we have added 2 new Arrow committers: <a href="https://github.com/theneuralbit">Brian +Hulette</a> and <a href="https://github.com/robertnishihara">Robert Nishihara</a>. Additionally, <a href="https://github.com/cpcloud">Phillip Cloud</a> and +<a href="https://github.com/pcmoritz">Philipp Moritz</a> have been promoted from committer to PMC +member. Congratulations and thank you for your contributions!</p> + +<h2 id="plasma-object-store-improvements">Plasma Object Store Improvements</h2> + +<p>The Plasma Object Store now supports managing interprocess shared memory on +CUDA-enabled GPUs. We are excited to see more GPU-related functionality develop +in Apache Arrow, as this has become a key computing environment for scalable +machine learning.</p> + +<h2 id="python-improvements">Python Improvements</h2> + +<p><a href="https://github.com/pitrou">Antoine Pitrou</a> has joined the Python development efforts and helped +significantly this release with interoperability with built-in CPython data +structures and NumPy structured data types.</p> + +<ul> + <li>New experimental support for reading Apache ORC files</li> + <li><code class="highlighter-rouge">pyarrow.array</code> now accepts lists of tuples or Python dicts for creating +Arrow struct type arrays.</li> + <li>NumPy structured dtypes (which are row/record-oriented) can be directly +converted to Arrow struct (column-oriented) arrays</li> + <li>Python 3.6 <code class="highlighter-rouge">pathlib</code> objects for file paths are now accepted in many file +APIs, including for Parquet files</li> + <li>Arrow integer arrays with nulls can now be converted to NumPy object arrays +with <code class="highlighter-rouge">None</code> values</li> + <li>New <code class="highlighter-rouge">pyarrow.foreign_buffer</code> API for interacting with memory blocks located +at particular memory addresses</li> +</ul> + +<h2 id="java-improvements">Java Improvements</h2> + +<p>Java now fully supports the <code class="highlighter-rouge">FixedSizeBinary</code> data type.</p> + +<h2 id="javascript-improvements">JavaScript Improvements</h2> + +<p>The JavaScript library has been significantly refactored and expanded. We are +making separate Apache releases (most recently <code class="highlighter-rouge">JS-0.3.1</code>) for JavaScript, +which are being <a href="https://www.npmjs.com/package/apache-arrow">published to NPM</a>.</p> + +<h2 id="upcoming-roadmap">Upcoming Roadmap</h2> + +<p>In the coming months, we will be working to move Apache Arrow closer to a 1.0.0 +release. We will also be discussing plans to develop native Arrow-based +computational libraries within the project.</p> + + + </div> + + + + + + <div class="container"> + <h2> Apache Arrow 0.8.0 Release <a href="/blog/2017/12/18/0.8.0-release/" class="permalink" title="Permalink">â</a> </h2> @@ -150,7 +329,7 @@ --> <p>The Apache Arrow team is pleased to announce the 0.8.0 release. It is the -product of 10 weeks of development andincludes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.8.0"><strong>286 resolved JIRAs</strong></a> with +product of 10 weeks of development and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.8.0"><strong>286 resolved JIRAs</strong></a> with many new features and bug fixes to the various language implementations. This is the largest release since 0.3.0 earlier this year.</p> @@ -310,7 +489,7 @@ implementations and bindings to more languages.</p> <div class="container"> <h2> Improvements to Java Vector API in Apache Arrow 0.8.0 - <a href="/blog/2017/12/18/java-vector-improvements/" class="permalink" title="Permalink">â</a> + <a href="/blog/2017/12/19/java-vector-improvements/" class="permalink" title="Permalink">â</a> </h2> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/docs/ipc.html ---------------------------------------------------------------------- diff --git a/docs/ipc.html b/docs/ipc.html index 5022c80..a825908 100644 --- a/docs/ipc.html +++ b/docs/ipc.html @@ -146,7 +146,7 @@ <ul> <li>A length prefix indicating the metadata size</li> - <li>The message metadata as a <a href="https://github.com/google/flatbuffers">Flatbuffer</a></li> + <li>The message metadata as a <a href="https://github.com/google]/flatbuffers">Flatbuffer</a></li> <li>Padding bytes to an 8-byte boundary</li> <li>The message body, which must be a multiple of 8 bytes</li> </ul> @@ -191,9 +191,7 @@ flatbuffer union), and the size of the message body:</p> of encapsulated messages, each of which follows the format above. The schema comes first in the stream, and it is the same for all of the record batches that follow. If any fields in the schema are dictionary-encoded, one or more -<code class="highlighter-rouge">DictionaryBatch</code> messages will be included. <code class="highlighter-rouge">DictionaryBatch</code> and -<code class="highlighter-rouge">RecordBatch</code> messages may be interleaved, but before any dictionary key is used -in a <code class="highlighter-rouge">RecordBatch</code> it should be defined in a <code class="highlighter-rouge">DictionaryBatch</code>.</p> +<code class="highlighter-rouge">DictionaryBatch</code> messages will follow the schema.</p> <div class="highlighter-rouge"><pre class="highlight"><code><SCHEMA> <DICTIONARY 0> @@ -201,10 +199,6 @@ in a <code class="highlighter-rouge">RecordBatch</code> it should be defined in <DICTIONARY k - 1> <RECORD BATCH 0> ... -<DICTIONARY x DELTA> -... -<DICTIONARY y DELTA> -... <RECORD BATCH n - 1> <EOS [optional]: int32> </code></pre> @@ -239,10 +233,6 @@ footer.</p> </code></pre> </div> -<p>In the file format, there is no requirement that dictionary keys should be -defined in a <code class="highlighter-rouge">DictionaryBatch</code> before they are used in a <code class="highlighter-rouge">RecordBatch</code>, as long -as the keys are defined somewhere in the file.</p> - <h3 id="recordbatch-body-structure">RecordBatch body structure</h3> <p>The <code class="highlighter-rouge">RecordBatch</code> metadata contains a depth-first (pre-order) flattened set of @@ -316,7 +306,6 @@ the dictionaries can be properly interpreted.</p> <div class="highlighter-rouge"><pre class="highlight"><code>table DictionaryBatch { id: long; data: RecordBatch; - isDelta: boolean = false; } </code></pre> </div> @@ -326,38 +315,6 @@ in the schema, so that dictionaries can even be used for multiple fields. See the <a href="https://github.com/apache/arrow/blob/master/format/Layout.md">Physical Layout</a> document for more about the semantics of dictionary-encoded data.</p> -<p>The dictionary <code class="highlighter-rouge">isDelta</code> flag allows dictionary batches to be modified -mid-stream. A dictionary batch with <code class="highlighter-rouge">isDelta</code> set indicates that its vector -should be concatenated with those of any previous batches with the same <code class="highlighter-rouge">id</code>. A -stream which encodes one column, the list of strings -<code class="highlighter-rouge">["A", "B", "C", "B", "D", "C", "E", "A"]</code>, with a delta dictionary batch could -take the form:</p> - -<div class="highlighter-rouge"><pre class="highlight"><code><SCHEMA> -<DICTIONARY 0> -(0) "A" -(1) "B" -(2) "C" - -<RECORD BATCH 0> -0 -1 -2 -1 - -<DICTIONARY 0 DELTA> -(3) "D" -(4) "E" - -<RECORD BATCH 1> -3 -2 -4 -0 -EOS -</code></pre> -</div> - <h3 id="tensor-multi-dimensional-array-message-format">Tensor (Multi-dimensional Array) Message Format</h3> <p>The <code class="highlighter-rouge">Tensor</code> message types provides a way to write a multidimensional array of http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/docs/memory_layout.html ---------------------------------------------------------------------- diff --git a/docs/memory_layout.html b/docs/memory_layout.html index ff8f9e8..10fc82c 100644 --- a/docs/memory_layout.html +++ b/docs/memory_layout.html @@ -162,8 +162,9 @@ from <code class="highlighter-rouge">List<V></code> iff U and V are differ or a fully-specified nested type. When we say slot we mean a relative type value, not necessarily any physical storage region.</li> <li>Logical type: A data type that is implemented using some relative (physical) -type. For example, Decimal values are stored as 16 bytes in a fixed byte -size array. Similarly, strings can be stored as <code class="highlighter-rouge">List<1-byte></code>.</li> +type. For example, a Decimal value stored in 16 bytes could be stored in a +primitive array with slot size 16 bytes. Similarly, strings can be stored as +<code class="highlighter-rouge">List<1-byte></code>.</li> <li>Parent and child arrays: names to express relationships between physical value arrays in a nested type structure. For example, a <code class="highlighter-rouge">List<T></code>-type parent array has a T-type array as its child (see more on lists below).</li> @@ -752,9 +753,9 @@ the the types array indicates that a slot contains a different type at the index <h2 id="dictionary-encoding">Dictionary encoding</h2> <p>When a field is dictionary encoded, the values are represented by an array of Int32 representing the index of the value in the dictionary. -The Dictionary is received as one or more DictionaryBatches with the id referenced by a dictionary attribute defined in the metadata (<a href="https://github.com/apache/arrow/blob/master/format/Message.fbs">Message.fbs</a>) in the Field table. -The dictionary has the same layout as the type of the field would dictate. Each entry in the dictionary can be accessed by its index in the DictionaryBatches. -When a Schema references a Dictionary id, it must send at least one DictionaryBatch for this id.</p> +The Dictionary is received as a DictionaryBatch whose id is referenced by a dictionary attribute defined in the metadata (<a href="https://github.com/apache/arrow/blob/master/format/Message.fbs">Message.fbs</a>) in the Field table. +The dictionary has the same layout as the type of the field would dictate. Each entry in the dictionary can be accessed by its index in the DictionaryBatch. +When a Schema references a Dictionary id, it must send a DictionaryBatch for this id before any RecordBatch.</p> <p>As an example, you could have the following data:</p> <div class="highlighter-rouge"><pre class="highlight"><code>type: List<String> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/docs/metadata.html ---------------------------------------------------------------------- diff --git a/docs/metadata.html b/docs/metadata.html index 858f0c0..df36202 100644 --- a/docs/metadata.html +++ b/docs/metadata.html @@ -531,8 +531,7 @@ logical type, which have no children) and 3 buffers:</p> <h3 id="decimal">Decimal</h3> -<p>Decimals are represented as a 2âs complement 128-bit (16 byte) signed integer -in little-endian byte order.</p> +<p>TBD</p> <h3 id="timestamp">Timestamp</h3> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/feed.xml ---------------------------------------------------------------------- diff --git a/feed.xml b/feed.xml index 27952f7..d4d6c6f 100644 --- a/feed.xml +++ b/feed.xml @@ -1,9 +1,124 @@ -<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2018-02-23T11:25:04-05:00</updated><id>/</id><entry><title type="html">Apache Arrow 0.8.0 Release</title><link href="/blog/2017/12/18/0.8.0-release/" rel="alternate" type="text/html" title="Apache Arrow 0.8.0 Release" /><published>2017-12-18T23:01:00-05:00</published><updated>2017-12-18T23:01:00-05:00</updated><id>/blog/2017/12/18/0.8.0-release</id><content type="html" xml:base="/blog/2017/12/18/0.8.0-release/"><!-- +<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2018-03-22T09:01:09-04:00</updated><id>/</id><entry><title type="html">A Native Go Library for Apache Arrow</title><link href="/blog/2018/03/22/go-code-donation/" rel="alternate" type="text/html" title="A Native Go Library for Apache Arrow" /><published>2018-03-22T00:00:00-04:00</published><updated>2018-03-22T00:00:00-04:00</updated><id>/blog/2018/03/22/go-code-donation</id><content type="html" xml:base="/blog/2018/03/22/go-code-donation/"><!-- + +--> + +<p>Since launching in early 2016, Apache Arrow has been growing fast. We have made +nine major releases through the efforts of over 120 distinct contributors. The +projectâs scope has also expanded. We began by focusing on the development of +the standardized in-memory columnar data format, which now serves as a pillar +of the project. Since then, we have been growing into a more general +cross-language platform for in-memory data analysis through new additions to +the project like the <a href="http://arrow.apache.org/blog/2017/08/16/0.6.0-release/">Plasma shared memory object store</a>. A primary goal of +the project is to enable data system developers to process and move data fast.</p> + +<p>So far, we officially have developed native Arrow implementations in C++, Java, +and JavaScript. We have created binding layers for the C++ libraries in C +(using the GLib libraries) and Python. We have also seen efforts to develop +interfaces to the Arrow C++ libraries in Go, Lua, Ruby, and Rust. While binding +layers serve many purposes, there can be benefits to native implementations, +and so weâve been keen to see future work on native implementations in growing +systems languages like Go and Rust.</p> + +<p>This past October, engineers <a href="https://github.com/stuartcarnie">Stuart Carnie</a>, <a href="https://github.com/nathanielc">Nathaniel Cook</a>, and +<a href="https://github.com/goller">Chris Goller</a>, employees of <a href="https://influxdata.com">InfluxData</a>, began developing a native [Go +language implementation of the <a href="https://github.com/influxdata/arrow">Apache Arrow</a> in-memory columnar format for +use in Go-based database systems like InfluxDB. We are excited to announce that +InfluxData has donated this native Go implementation to the Apache Arrow +project, where it will continue to be developed. This work features low-level +integration with the Go runtime and native support for SIMD instruction +sets. We are looking forward to working more closely with the Go community on +solving in-memory analytics and data interoperability problems.</p> + +<div align="center"> +<img src="/img/native_go_implementation.png" alt="Apache Arrow implementations and bindings" width="60%" class="img-responsive" /> +</div> + +<p>One of the mantras in <a href="https://www.apache.org">The Apache Software Foundation</a> is âCommunity over +Codeâ. By building an open and collaborative development community across many +programming language ecosystems, we will be able to development better and +longer-lived solutions to the systems problems faced by data developers.</p> + +<p>We are excited for what the future holds for the Apache Arrow project. Adding +first-class support for a popular systems programming language like Go is an +important step along the way. We welcome others from the Go community to get +involved in the project. We also welcome others who wish to explore building +Arrow support for other programming languages not yet represented. Learn more +at <a href="https://arrow.apache.org">https://arrow.apache.org</a> and join the mailing list +<a href="https://lists.apache.org/[email protected]">[email protected]</a>.</p></content><author><name>pmc</name></author></entry><entry><title type="html">Apache Arrow 0.9.0 Release</title><link href="/blog/2018/03/22/0.9.0-release/" rel="alternate" type="text/html" title="Apache Arrow 0.9.0 Release" /><published>2018-03-22T00:00:00-04:00</published><updated>2018-03-22T00:00:00-04:00</updated><id>/blog/2018/03/22/0.9.0-release</id><content type="html" xml:base="/blog/2018/03/22/0.9.0-release/"><!-- + +--> + +<p>The Apache Arrow team is pleased to announce the 0.9.0 release. It is the +product of over 3 months of development and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.9.0"><strong>260 resolved +JIRAs</strong></a>.</p> + +<p>While we made some of backwards-incompatible columnar binary format changes in +last Decemberâs 0.8.0 release, the 0.9.0 release is backwards-compatible with +0.8.0. We will be working toward a 1.0.0 release this year, which will mark +longer-term binary stability for the Arrow columnar format and metadata.</p> + +<p>See the <a href="https://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your +platform. The <a href="https://arrow.apache.org/release/0.8.0.html">complete changelog</a> is also available.</p> + +<p>We discuss some highlights from the release and other project news in this +post. This release has been overall focused more on bug fixes, compatibility, +and stability compared with previous releases which have pushed more on new and +expanded features.</p> + +<h2 id="new-arrow-committers-and-pmc-members">New Arrow committers and PMC members</h2> + +<p>Since the last release, we have added 2 new Arrow committers: <a href="https://github.com/theneuralbit">Brian +Hulette</a> and <a href="https://github.com/robertnishihara">Robert Nishihara</a>. Additionally, <a href="https://github.com/cpcloud">Phillip Cloud</a> and +<a href="https://github.com/pcmoritz">Philipp Moritz</a> have been promoted from committer to PMC +member. Congratulations and thank you for your contributions!</p> + +<h2 id="plasma-object-store-improvements">Plasma Object Store Improvements</h2> + +<p>The Plasma Object Store now supports managing interprocess shared memory on +CUDA-enabled GPUs. We are excited to see more GPU-related functionality develop +in Apache Arrow, as this has become a key computing environment for scalable +machine learning.</p> + +<h2 id="python-improvements">Python Improvements</h2> + +<p><a href="https://github.com/pitrou">Antoine Pitrou</a> has joined the Python development efforts and helped +significantly this release with interoperability with built-in CPython data +structures and NumPy structured data types.</p> + +<ul> + <li>New experimental support for reading Apache ORC files</li> + <li><code class="highlighter-rouge">pyarrow.array</code> now accepts lists of tuples or Python dicts for creating +Arrow struct type arrays.</li> + <li>NumPy structured dtypes (which are row/record-oriented) can be directly +converted to Arrow struct (column-oriented) arrays</li> + <li>Python 3.6 <code class="highlighter-rouge">pathlib</code> objects for file paths are now accepted in many file +APIs, including for Parquet files</li> + <li>Arrow integer arrays with nulls can now be converted to NumPy object arrays +with <code class="highlighter-rouge">None</code> values</li> + <li>New <code class="highlighter-rouge">pyarrow.foreign_buffer</code> API for interacting with memory blocks located +at particular memory addresses</li> +</ul> + +<h2 id="java-improvements">Java Improvements</h2> + +<p>Java now fully supports the <code class="highlighter-rouge">FixedSizeBinary</code> data type.</p> + +<h2 id="javascript-improvements">JavaScript Improvements</h2> + +<p>The JavaScript library has been significantly refactored and expanded. We are +making separate Apache releases (most recently <code class="highlighter-rouge">JS-0.3.1</code>) for JavaScript, +which are being <a href="https://www.npmjs.com/package/apache-arrow">published to NPM</a>.</p> + +<h2 id="upcoming-roadmap">Upcoming Roadmap</h2> + +<p>In the coming months, we will be working to move Apache Arrow closer to a 1.0.0 +release. We will also be discussing plans to develop native Arrow-based +computational libraries within the project.</p></content><author><name>wesm</name></author></entry><entry><title type="html">Apache Arrow 0.8.0 Release</title><link href="/blog/2017/12/18/0.8.0-release/" rel="alternate" type="text/html" title="Apache Arrow 0.8.0 Release" /><published>2017-12-18T23:01:00-05:00</published><updated>2017-12-18T23:01:00-05:00</updated><id>/blog/2017/12/18/0.8.0-release</id><content type="html" xml:base="/blog/2017/12/18/0.8.0-release/"><!-- --> <p>The Apache Arrow team is pleased to announce the 0.8.0 release. It is the -product of 10 weeks of development andincludes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.8.0"><strong>286 resolved JIRAs</strong></a> with +product of 10 weeks of development and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.8.0"><strong>286 resolved JIRAs</strong></a> with many new features and bug fixes to the various language implementations. This is the largest release since 0.3.0 earlier this year.</p> @@ -151,7 +266,7 @@ working to improve and expand the libraries in support of downstream use cases.& <p>We continue to look for more JavaScript, Julia, R, Rust, and other programming language developers to join the project and expand the available -implementations and bindings to more languages.</p></content><author><name>wesm</name></author></entry><entry><title type="html">Improvements to Java Vector API in Apache Arrow 0.8.0</title><link href="/blog/2017/12/18/java-vector-improvements/" rel="alternate" type="text/html" title="Improvements to Java Vector API in Apache Arrow 0.8.0" /><published>2017-12-18T19:00:00-05:00</published><updated>2017-12-18T19:00:00-05:00</updated><id>/blog/2017/12/18/java-vector-improvements</id><content type="html" xml:base="/blog/2017/12/18/java-vector-improvements/"><!-- +implementations and bindings to more languages.</p></content><author><name>wesm</name></author></entry><entry><title type="html">Improvements to Java Vector API in Apache Arrow 0.8.0</title><link href="/blog/2017/12/19/java-vector-improvements/" rel="alternate" type="text/html" title="Improvements to Java Vector API in Apache Arrow 0.8.0" /><published>2017-12-18T19:00:00-05:00</published><updated>2017-12-18T19:00:00-05:00</updated><id>/blog/2017/12/19/java-vector-improvements</id><content type="html" xml:base="/blog/2017/12/19/java-vector-improvements/"><!-- --> @@ -1073,118 +1188,4 @@ systems to improve their processing performance and interoperability with other systems.</p> <p>We are discussing the roadmap to a future 1.0.0 release on the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">developer -mailing list</a>. Please join the discussion there.</p></content><author><name>wesm</name></author></entry><entry><title type="html">Connecting Relational Databases to the Apache Arrow World with turbodbc</title><link href="/blog/2017/06/16/turbodbc-arrow/" rel="alternate" type="text/html" title="Connecting Relational Databases to the Apache Arrow World with turbodbc" /><published>2017-06-16T04:00:00-04:00</published><updated>2017-06-16T04:00:00-04:00</updated><id>/blog/2017/06/16/turbodbc-arrow</id><content type="html" xml:base="/blog/2017/06/16/turbodbc-arrow/"><!-- - ---> - -<p><em><a href="https://github.com/mathmagique">Michael König</a> is the lead developer of the <a href="https://github.com/blue-yonder/turbodbc">turbodbc project</a></em></p> - -<p>The <a href="https://arrow.apache.org/">Apache Arrow</a> project set out to become the universal data layer for -column-oriented data processing systems without incurring serialization costs -or compromising on performance on a more general level. While relational -databases still lag behind in Apache Arrow adoption, the Python database module -<a href="https://github.com/blue-yonder/turbodbc">turbodbc</a> brings Apache Arrow support to these databases using a much -older, more specialized data exchange layer: <a href="https://en.wikipedia.org/wiki/Open_Database_Connectivity">ODBC</a>.</p> - -<p>ODBC is a database interface that offers developers the option to transfer data -either in row-wise or column-wise fashion. Previous Python ODBC modules typically -use the row-wise approach, and often trade repeated database roundtrips for simplified -buffer handling. This makes them less suited for data-intensive applications, -particularly when interfacing with modern columnar analytical databases.</p> - -<p>In contrast, turbodbc was designed to leverage columnar data processing from day -one. Naturally, this implies using the columnar portion of the ODBC API. Equally -important, however, is to find new ways of providing columnar data to Python users -that exceed the capabilities of the row-wise API mandated by Pythonâs <a href="https://www.python.org/dev/peps/pep-0249/">PEP 249</a>. -Turbodbc has adopted Apache Arrow for this very task with the recently released -version 2.0.0:</p> - -<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">turbodbc</span> <span class="kn">import</span> <span class="n">connect</span> -<span class="o">&gt;&gt;&gt;</span> <span class="n">connection</span> <span class="o">=</span> <span class="n">connect</span><span class="p">(</span><span class="n">dsn</span><span class="o">=</span><span class="s">"My columnar database"</span><span class="p">)</span> -<span class="o">&gt;&gt;&gt;</span> <span class="n">cursor</span> <span class="o">=</span> <span class="n">connection</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span> -<span class="o">&gt;&gt;&gt;</span> <span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"SELECT some_integers, some_strings FROM my_table"</span><span class="p">)</span> -<span class="o">&gt;&gt;&gt;</span> <span class="n">cursor</span><span class="o">.</span><span class="n">fetchallarrow</span><span class="p">()</span> -<span class="n">pyarrow</span><span class="o">.</span><span class="n">Table</span> -<span class="n">some_integers</span><span class="p">:</span> <span class="n">int64</span> -<span class="n">some_strings</span><span class="p">:</span> <span class="n">string</span> -</code></pre> -</div> - -<p>With this new addition, the data flow for a result set of a typical SELECT query -is like this:</p> -<ul> - <li>The database prepares the result set and exposes it to the ODBC driver using -either row-wise or column-wise storage.</li> - <li>Turbodbc has the ODBC driver write chunks of the result set into columnar buffers.</li> - <li>These buffers are exposed to turbodbcâs Apache Arrow frontend. This frontend -will create an Arrow table and fill in the buffered values.</li> - <li>The previous steps are repeated until the entire result set is retrieved.</li> -</ul> - -<p><img src="/img/turbodbc_arrow.png" alt="Data flow from relational databases to Python with turbodbc and the Apache Arrow frontend" class="img-responsive" width="75%" /></p> - -<p>In practice, it is possible to achieve the following ideal situation: A 64-bit integer -column is stored as one contiguous block of memory in a columnar database. A huge chunk -of 64-bit integers is transferred over the network and the ODBC driver directly writes -it to a turbodbc buffer of 64-bit integers. The Arrow frontend accumulates these values -by copying the entire 64-bit buffer into a free portion of an Arrow tableâs 64-bit -integer column.</p> - -<p>Moving data from the database to an Arrow table and, thus, providing it to the Python -user can be as simple as copying memory blocks around, megabytes equivalent to hundred -thousands of rows at a time. The absence of serialization and conversion logic renders -the process extremely efficient.</p> - -<p>Once the data is stored in an Arrow table, Python users can continue to do some -actual work. They can convert it into a <a href="https://arrow.apache.org/docs/python/pandas.html">Pandas DataFrame</a> for data analysis -(using a quick <code class="highlighter-rouge">table.to_pandas()</code>), pass it on to other data processing -systems such as <a href="http://spark.apache.org/">Apache Spark</a> or <a href="http://impala.apache.org/">Apache Impala (incubating)</a>, or store -it in the <a href="http://parquet.apache.org/">Apache Parquet</a> file format. This way, non-Python systems are -efficiently connected with relational databases.</p> - -<p>In the future, turbodbcâs Arrow support will be extended to use more -sophisticated features such as <a href="https://arrow.apache.org/docs/memory_layout.html#dictionary-encoding">dictionary-encoded</a> string fields. We also -plan to pick smaller than 64-bit <a href="https://arrow.apache.org/docs/metadata.html#integers">data types</a> where possible. Last but not -least, Arrow support will be extended to cover the reverse direction of data -flow, so that Python users can quickly insert Arrow tables into relational -databases.</p> - -<p>If you would like to learn more about turbodbc, check out the <a href="https://github.com/blue-yonder/turbodbc">GitHub project</a> and the -<a href="http://turbodbc.readthedocs.io/">project documentation</a>. If you want to learn more about how turbodbc implements the -nitty-gritty details, check out parts <a href="https://tech.blue-yonder.com/making-of-turbodbc-part-1-wrestling-with-the-side-effects-of-a-c-api/">one</a> and <a href="https://tech.blue-yonder.com/making-of-turbodbc-part-2-c-to-python/">two</a> of the -<a href="https://tech.blue-yonder.com/making-of-turbodbc-part-1-wrestling-with-the-side-effects-of-a-c-api/">âMaking of turbodbcâ</a> series at <a href="https://tech.blue-yonder.com/">Blue Yonderâs technology blog</a>.</p></content><author><name>MathMagique</name></author></entry><entry><title type="html">Apache Arrow 0.4.1 Release</title><link href="/blog/2017/06/14/0.4.1-release/" rel="alternate" type="text/html" title="Apache Arrow 0.4.1 Release" /><published>2017-06-14T10:00:00-04:00</published><updated>2017-06-14T10:00:00-04:00</updated><id>/blog/2017/06/14/0.4.1-release</id><content type="html" xml:base="/blog/2017/06/14/0.4.1-release/"><!-- - ---> - -<p>The Apache Arrow team is pleased to announce the 0.4.1 release of the -project. This is a bug fix release that addresses a regression with Decimal -types in the Java implementation introduced in 0.4.0 (see -<a href="https://issues.apache.org/jira/browse/ARROW-1091">ARROW-1091</a>). There were a total of <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.4.1">31 resolved JIRAs</a>.</p> - -<p>See the <a href="http://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your platform.</p> - -<h3 id="python-wheel-installers-for-windows">Python Wheel Installers for Windows</h3> - -<p>Max Risuhin contributed fixes to enable binary wheel installers to be generated -for Python 3.5 and 3.6. Thus, 0.4.1 is the first Arrow release for which -PyArrow including bundled <a href="http://parquet.apache.org">Apache Parquet</a> support that can be installed -with either conda or pip across the 3 major platforms: Linux, macOS, and -Windows. Use one of:</p> - -<div class="highlighter-rouge"><pre class="highlight"><code>pip install pyarrow -conda install pyarrow -c conda-forge -</code></pre> -</div> - -<h3 id="turbodbc-200-with-apache-arrow-support">Turbodbc 2.0.0 with Apache Arrow Support</h3> - -<p><a href="http://turbodbc.readthedocs.io/">Turbodbc</a>, a fast C++ ODBC interface with Python bindings, released -version 2.0.0 including reading SQL result sets as Arrow record batches. The -team used the PyArrow C++ API introduced in version 0.4.0 to construct -<code class="highlighter-rouge">pyarrow.Table</code> objects inside the <code class="highlighter-rouge">turbodbc</code> library. Learn more in their -<a href="http://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html#apache-arrow-support">documentation</a> and install with one of:</p> - -<div class="highlighter-rouge"><pre class="highlight"><code>pip install turbodbc -conda install turbodbc -c conda-forge -</code></pre> -</div></content><author><name>wesm</name></author></entry></feed> \ No newline at end of file +mailing list</a>. Please join the discussion there.</p></content><author><name>wesm</name></author></entry></feed> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/img/native_go_implementation.png ---------------------------------------------------------------------- diff --git a/img/native_go_implementation.png b/img/native_go_implementation.png new file mode 100644 index 0000000..39f0952 Binary files /dev/null and b/img/native_go_implementation.png differ http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/index.html ---------------------------------------------------------------------- diff --git a/index.html b/index.html index 6c726c5..a4b137f 100644 --- a/index.html +++ b/index.html @@ -120,7 +120,7 @@ <p class="lead">A cross-language development platform for in-memory data</p> <p> <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:[email protected]" role="button">Join Mailing List</a> - <a class="btn btn-lg btn-primary" style="white-space: normal;" href="/install/" role="button">Install (0.8.0 Release - 18 December 2017)</a> + <a class="btn btn-lg btn-primary" style="white-space: normal;" href="/install/" role="button">Install (0.9.0 Release - 21 March 2018)</a> </p> </div> <div class="row"> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/install/index.html ---------------------------------------------------------------------- diff --git a/install/index.html b/install/index.html index 27f848c..28f678c 100644 --- a/install/index.html +++ b/install/index.html @@ -118,24 +118,24 @@ --> -<h2 id="current-version-080">Current Version: 0.8.0</h2> +<h2 id="current-version-090">Current Version: 0.9.0</h2> -<h3 id="released-18-december-2017">Released: 18 December 2017</h3> +<h3 id="released-21-march-2018">Released: 21 March 2018</h3> -<p>See the <a href="http://arrow.apache.org/release/0.8.0.html">release notes</a> for more about whatâs new.</p> +<p>See the <a href="http://arrow.apache.org/release/0.9.0.html">release notes</a> for more about whatâs new.</p> <h3 id="source-release">Source release</h3> <ul> - <li><strong>Source Release</strong>: <a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz">apache-arrow-0.8.0.tar.gz</a></li> - <li><strong>Verification</strong>: <a href="https://www.apache.org/dist/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz.sha512">sha512</a>, <a href="https://www.apache.org/dist/arrow/arrow-0.8.0/apache-arrow-0.8.0.tar.gz.asc">asc</a> (<a href="https://www.apache.org/dyn/closer.cgi#verify">verification instructions</a>)</li> - <li><a href="https://github.com/apache/arrow/releases/tag/apache-arrow-0.8.0">Git tag 1d689e5</a></li> + <li><strong>Source Release</strong>: <a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.9.0/apache-arrow-0.9.0.tar.gz">apache-arrow-0.9.0.tar.gz</a></li> + <li><strong>Verification</strong>: <a href="https://www.apache.org/dist/arrow/arrow-0.9.0/apache-arrow-0.9.0.tar.gz.sha512">sha512</a>, <a href="https://www.apache.org/dist/arrow/arrow-0.9.0/apache-arrow-0.9.0.tar.gz.asc">asc</a> (<a href="https://www.apache.org/dyn/closer.cgi#verify">verification instructions</a>)</li> + <li><a href="https://github.com/apache/arrow/releases/tag/apache-arrow-0.9.0">Git tag c695a5d</a></li> <li><a href="http://www.apache.org/dist/arrow/KEYS">PGP keys for release signatures</a></li> </ul> <h3 id="java-packages">Java Packages</h3> -<p><a href="http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.arrow%22%20AND%20v%3A%220.8.0%22">Java Artifacts on Maven Central</a></p> +<p><a href="http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.arrow%22%20AND%20v%3A%220.9.0%22">Java Artifacts on Maven Central</a></p> <h2 id="binary-installers-for-c-c-python">Binary Installers for C, C++, Python</h2> @@ -153,8 +153,8 @@ platforms:</p> <p>Install them with:</p> -<div class="language-shell highlighter-rouge"><pre class="highlight"><code>conda install arrow-cpp<span class="o">=</span>0.8.<span class="k">*</span> -c conda-forge -conda install <span class="nv">pyarrow</span><span class="o">=</span>0.8.<span class="k">*</span> -c conda-forge +<div class="language-shell highlighter-rouge"><pre class="highlight"><code>conda install arrow-cpp<span class="o">=</span>0.9.<span class="k">*</span> -c conda-forge +conda install <span class="nv">pyarrow</span><span class="o">=</span>0.9.<span class="k">*</span> -c conda-forge </code></pre> </div> @@ -162,11 +162,11 @@ conda install <span class="nv">pyarrow</span><span class="o">=</span>0.8.<span c <p>We have provided binary wheels on PyPI for Linux, macOS, and Windows:</p> -<div class="language-shell highlighter-rouge"><pre class="highlight"><code>pip install <span class="nv">pyarrow</span><span class="o">==</span>0.8.<span class="k">*</span> +<div class="language-shell highlighter-rouge"><pre class="highlight"><code>pip install <span class="nv">pyarrow</span><span class="o">==</span>0.9.<span class="k">*</span> </code></pre> </div> -<p>We recommend pinning <code class="highlighter-rouge">0.8.*</code> in <code class="highlighter-rouge">requirements.txt</code> to install the latest patch +<p>We recommend pinning <code class="highlighter-rouge">0.9.*</code> in <code class="highlighter-rouge">requirements.txt</code> to install the latest patch release.</p> <p>These include the Apache Arrow and Apache Parquet C++ binary libraries bundled http://git-wip-us.apache.org/repos/asf/arrow-site/blob/74ed9477/powered_by/index.html ---------------------------------------------------------------------- diff --git a/powered_by/index.html b/powered_by/index.html index 0fb59f2..2b2b111 100644 --- a/powered_by/index.html +++ b/powered_by/index.html @@ -193,6 +193,7 @@ handles. This work is part of the <a href="https://gpuopenanalytics.com/">GPU Op <li><strong><a href="https://pandas.pydata.org">pandas</a>:</strong> data analysis toolkit for Python programmers. pandas supports reading and writing Parquet files using pyarrow. Several pandas core developers are also contributors to Apache Arrow.</li> + <li><strong><a href="https://github.com/jpmorganchase/perspective">Perspective</a>:</strong> Perspective is a streaming data visualization engine in JavaScript for building real-time & user-configurable analytics entirely in the browser.</li> <li><strong><a href="https://quiltdata.com/">Quilt Data</a>:</strong> Quilt is a data package manager, designed to make managing data as easy as managing code. It supports Parquet format via pyarrow for data access.</li>
