Update site for 0.8.0
Project: http://git-wip-us.apache.org/repos/asf/arrow-site/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow-site/commit/61e9ea7e Tree: http://git-wip-us.apache.org/repos/asf/arrow-site/tree/61e9ea7e Diff: http://git-wip-us.apache.org/repos/asf/arrow-site/diff/61e9ea7e Branch: refs/heads/asf-site Commit: 61e9ea7e23eced764d5d327f469bf513fe2f37d6 Parents: 35611f8 Author: Jacques Nadeau <[email protected]> Authored: Sun Dec 17 22:03:06 2017 -0800 Committer: Jacques Nadeau <[email protected]> Committed: Mon Dec 18 13:33:54 2017 -0800 ---------------------------------------------------------------------- blog/2017/05/07/0.3-release-japanese/index.html | 288 ++++++++++++ blog/2017/05/07/0.3-release/index.html | 364 +++++++++++++++ blog/2017/05/22/0.4.0-release/index.html | 225 +++++++++ blog/2017/06/14/0.4.1-release/index.html | 1 + blog/2017/06/16/turbodbc-arrow/index.html | 1 + blog/2017/07/24/0.5.0-release/index.html | 235 ++++++++++ blog/2017/07/26/spark-arrow/index.html | 7 +- .../07/plasma-in-memory-object-store/index.html | 273 +++++++++++ blog/2017/08/15/0.6.0-release/index.html | 234 ++++++++++ blog/2017/09/18/0.7.0-release/index.html | 311 ++++++++++++ .../index.html | 1 + blog/index.html | 33 +- committers/index.html | 1 + css/main.css | 2 +- docs/ipc.html | 48 +- docs/memory_layout.html | 12 +- docs/metadata.html | 4 +- feed.xml | 28 +- index.html | 3 +- install/index.html | 57 ++- powered_by/index.html | 231 +++++++++ release/0.1.0.html | 1 + release/0.2.0.html | 1 + release/0.3.0.html | 1 + release/0.4.0.html | 1 + release/0.4.1.html | 1 + release/0.5.0.html | 1 + release/0.6.0.html | 1 + release/0.7.0.html | 1 + release/0.7.1.html | 1 + release/0.8.0.html | 468 +++++++++++++++++++ release/index.html | 2 + 32 files changed, 2770 insertions(+), 68 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/arrow-site/blob/61e9ea7e/blog/2017/05/07/0.3-release-japanese/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/05/07/0.3-release-japanese/index.html b/blog/2017/05/07/0.3-release-japanese/index.html new file mode 100644 index 0000000..20aaabd --- /dev/null +++ b/blog/2017/05/07/0.3-release-japanese/index.html @@ -0,0 +1,288 @@ +<!DOCTYPE html> +<html lang="en-US"> + <head> + <meta charset="UTF-8"> + <title>Apache Arrow Homepage</title> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="generator" content="Jekyll v3.4.3"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + + <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> + + <link href="/css/main.css" rel="stylesheet"> + <link href="/css/syntax.css" rel="stylesheet"> + <script src="https://code.jquery.com/jquery-3.2.1.min.js" + integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" + crossorigin="anonymous"></script> + <script src="/assets/javascripts/bootstrap.min.js"></script> + + <!-- Global Site Tag (gtag.js) - Google Analytics --> +<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script> +<script> + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments)}; + gtag('js', new Date()); + + gtag('config', 'UA-107500873-1'); +</script> + + + </head> + + + +<body class="wrap"> + <div class="container"> + <nav class="navbar navbar-default"> + <div class="container-fluid"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/">Apache Arrow™ </a> + </div> + + <!-- Collect the nav links, forms, and other content for toggling --> + <div class="collapse navbar-collapse" id="arrow-navbar"> + <ul class="nav navbar-nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Project Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/install/">Install</a></li> + <li><a href="/blog/">Blog</a></li> + <li><a href="/release/">Releases</a></li> + <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li> + <li><a href="https://github.com/apache/arrow">Source Code</a></li> + <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> + <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> + <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Specification<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/memory_layout.html">Memory Layout</a></li> + <li><a href="/docs/metadata.html">Metadata</a></li> + <li><a href="/docs/ipc.html">Messaging / IPC</a></li> + </ul> + </li> + + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Documentation<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/python">Python</a></li> + <li><a href="/docs/cpp">C++ API</a></li> + <li><a href="/docs/java">Java API</a></li> + <li><a href="/docs/c_glib">C GLib API</a></li> + </ul> + </li> + <!-- <li><a href="/blog">Blog</a></li> --> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">ASF Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="http://www.apache.org/">ASF Website</a></li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </li> + </ul> + <a href="http://www.apache.org/"> + <img style="float:right;" src="/img/asf_logo.svg" width="120px"/> + </a> + </div><!-- /.navbar-collapse --> + </div> + </nav> + + + <h2> + Apache Arrow 0.3.0ãªãªã¼ã¹ + <a href="/blog/2017/05/07/0.3-release-japanese/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 07 May 2017 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://wesmckinney.com"><i class="fa fa-user"></i> Wes McKinney (wesm)</a> + </div> + </div> + </div> + + <!-- + +--> + +<p><a href="/blog/2017/05/07/0.3-release/">åæï¼Englishï¼</a></p> + +<p>Apache Arrowãã¼ã ã¯0.3.0ã®ãªãªã¼ã¹ãã¢ãã¦ã³ã¹ã§ãã¦ããããã§ãã2æã«ãªãªã¼ã¹ãã0.2.0ãã10é±éã®æ´»çºãªéçºã®çµæãä»åã®ãªãªã¼ã¹ã§ãã<a href="https://github.com/apache/arrow/graphs/contributors"><strong>23人ã®ã³ã³ããªãã¥ã¼ã¿ã¼</strong></a>ã<a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.3.0"><strong>306åã®JIRAã®issueã解決</strong></a>ãã¾ããã</p> + +<p>è¤æ°ã®Arrowã®å®è£ ã«ããããã®æ°ããæ©è½ã追å ãã¦ãã¾ãã2017å¹´ãç¹ã«æ³¨åãã¦éçºããã®ã¯ãã¤ã³ã¡ã¢ãªã¼ç¨ã®ãã©ã¼ããããåã®ã¡ã¿ãã¼ã¿ãã¡ãã»ã¼ã¸ã³ã°ç¨ã®ãããã³ã«ã§ããããã¯ãããã°ãã¼ã¿ã¢ããªã±ã¼ã·ã§ã³ã«<strong>å®å®ãã¦ãã¦ãããã¯ã·ã§ã³ã§ä½¿ããåºç¤</strong>ãæä¾ããããã§ãã髿§è½IOã¨ã¤ã³ã¡ã¢ãªã¼ãã¼ã¿å¦çã«Arrowãæ´»ç¨ããããã«ã<a href="http://spark.apache.org">Apache Spark</a>ã»<a href="http://www.geomesa.org/">GeoMesa</a>ã³ãã¥ããã£ã¼ã¨ååãã¦ãã¦ã¨ã¦ãã¨ããµã¤ãã£ã³ã°ã§ãã</p> + +<p>ããããã®ãã©ãããã©ã¼ã ã§Arrowãä½¿ãæ¹æ³ã¯<a href="http://arrow.apache.org/install">ã¤ã³ã¹ãã¼ã«ãã¼ã¸</a>ãè¦ã¦ãã ããã</p> + +<p>Arrowã§ããã°ãã¼ã¿ã·ã¹ãã ãé«éåããã±ã¼ã¹ãå¢ããããã«ãè¿ããã¡ã«Apache Arrowã®ãã¼ãããããå ¬éããäºå®ã§ãã</p> + +<p>Arrowã®éçºã«åå ããã³ã³ããªãã¥ã¼ã¿ã¼ãåéãã¦ãã¾ãããã§ã«Arrowã®éçºã«åå ãã¦ããã³ãã¥ããã£ã¼ããã®ã³ã³ããªãã¥ã¼ã¿ã¼ãããã§ãããã¾ã åå ãã¦ããªãGoãRãJuliaã¨ãã£ãã³ãã¥ããã£ã¼ããã®ã³ã³ããªãã¥ã¼ã¿ã¼ãåéãã¦ãã¾ãã</p> + +<h3 id="ãã¡ã¤ã«ãã©ã¼ãããã¨ã¹ããªã¼ãã³ã°ãã©ã¼ãããã®å¼·å">ãã¡ã¤ã«ãã©ã¼ãããã¨ã¹ããªã¼ãã³ã°ãã©ã¼ãããã®å¼·å</h3> + +<p>0.2.0ã§ã¯<strong>ã©ã³ãã ã¢ã¯ã»ã¹</strong>ç¨ã¨<strong>ã¹ããªã¼ãã³ã°</strong>ç¨ã®Arrowã®ã¯ã¤ã¤ã¼ãã©ã¼ããããå°å ¥ãã¾ãããå®è£ ã®è©³ç´°ã¯<a href="http://arrow.apache.org/docs/ipc.html">IPC仿§</a>ãè¦ã¦ãã ãããã¦ã¼ã¹ã±ã¼ã¹ã¯<a href="http://wesmckinney.com/blog/arrow-streaming-columnar/">使ç¨ä¾ãç´¹ä»ããããã°</a>ãè¦ã¦ãã ããããããã®ãã©ã¼ãããã使ãã¨ä½ãªã¼ãã¼ãããã»ã³ãã¼ãªãã§Arrowã®ã¬ã³ã¼ããããã®ãã¤ãã¼ãã«ã¢ã¯ã»ã¹ã§ãã¾ãã</p> + +<p>0.3.0ã§ã¯ãã®ãã¤ããªã¼ãã©ãããã®ç´°ãã詳細ãããããåºãã¾ãããJavaãC++ãPythonéã®é£æºã®ãã¹ãããã³ããããè¨èªã§ã®åä½ãã¹ãã®æ´åãé²ãã¾ããã<a href="http://github.com/google/flatbuffers">Google Flatbuffers</a>ã¯ãåæ¹äºææ§ãå£ããã«ã¡ã¿ãã¼ã¿ã«æ°ããæ©è½ã追å ããã®ã«é常ã«å©ããã¾ããã</p> + +<p>ã¾ã ãã¤ããªã¼ãã©ã¼ãããã®åæ¹äºææ§ãå¿ ãå£ããªãã¨ç´æã§ããç¶æ ã§ã¯ããã¾ãããï¼ãããããã夿´ããå¿ è¦ããããªã«ããè¦ã¤ãããããããªãï¼ãã¡ã¸ã£ã¼ãªãªã¼ã¹éã§ã¯ä¸å¿ è¦ã«äºææ§ãå£ããªãããã«åªåããã¤ããã§ããApache Arrowã®Webãµã¤ããåã³ã³ãã¼ãã³ãã®ã¦ã¼ã¶ã¼åãã®ããã¥ã¡ã³ãããã³APIããã¥ã¡ã³ãã¸ã®ã³ã³ããªãã¥ã¼ã·ã§ã³ãéå¸¸ã«æè¿ãã¾ãã</p> + +<h3 id="è¾æ¸ã¨ã³ã³ã¼ãã£ã³ã°ã®ãµãã¼ã">è¾æ¸ã¨ã³ã³ã¼ãã£ã³ã°ã®ãµãã¼ã</h3> + +<p><a href="http://www.geomesa.org/">GeoMesa</a>ããã¸ã§ã¯ãã®<a href="https://github.com/elahrvivaz">Emilio Lahr-Vivaz</a>ã¯Javaã®Arrowå®è£ ã«è¾æ¸ã¨ã³ã³ã¼ã対å¿ãã¯ã¿ã¼ãã³ã³ããªãã¥ã¼ããã¾ããããããåãã¦ãC++ã¨Pythonã§ããµãã¼ããã¾ãããï¼<code class="highlighter-rouge">pandas.Categorical</code>ã¨ã飿ºã§ãã¾ããï¼è¾æ¸ã¨ã³ã³ã¼ãã£ã³ã°ç¨ã®ã¤ã³ãã°ã¬ã¼ã·ã§ã³ãã¹ãï¼C++ã¨Javaéã§ãã®ãã¼ã¿ãéåä¿¡ãããã¹ãï¼ã¯ã¾ã 宿ãã¦ãã¾ãããã0.4.0ã¾ã§ã«ã¯å®æããããã§ãã</p> + +<p>ããã¯ã«ãã´ãªã¼ãã¼ã¿ç¨ã®ä¸è¬çãªãã¼ã¿è¡¨ç¾ãã¯ããã¯ã§ããããã使ãã¨ãè¤æ°ã®ã¬ã³ã¼ããããã§å ±éã®ãè¾æ¸ããå ±æããåã¬ã³ã¼ããããã®å¤ã¯ãã®è¾æ¸ãåç §ããæ´æ°ã«ãªãã¾ãããã®ãã¼ã¿ã¯çµ±è¨çè¨èªï¼statistical languageï¼ã®åéã§ã¯ãã«ãã´ãªã¼ï¼categoricalï¼ãããå åï¼factorï¼ãã¨å¼ã°ãã¦ãã¾ããApache Parquetã®ãããªãã¡ã¤ã«ãã©ã¼ãããã®åéã§ã¯ãã¼ã¿å§ç¸®ã®ããã ãã«ä½¿ããã¦ãã¾ãã</p> + +<h3 id="æ¥ä»æå»åºå®é·åã®æ¡å¼µ">æ¥ä»ãæå»ãåºå®é·åã®æ¡å¼µ</h3> + +<p>0.2.0ã§ã¯ç¾å®ã«ä½¿ããã¦ããæ¥ä»ã»æå»åãã¤ã³ãã°ã¬ã¼ã·ã§ã³ãã¹ãä»ãã§å®å ¨ã«ãµãã¼ããããã¨ã諦ãã¾ããããããã¯<a href="http://parquet.apache.org">Apache Parquet</a>ã¨Apache Sparkã¨ã®é£æºã«å¿ è¦ãªæ©è½ã§ãã</p> + +<ul> + <li><strong>æ¥ä»</strong>: 32-bitï¼æ¥åä½ï¼ã¨64-bitï¼ããªç§åä½ï¼</li> + <li><strong>æå»</strong>: åä½ä»ã64-bitæ´æ°ï¼åä½ï¼ç§ãããªç§ããã¤ã¯ãç§ãããç§ï¼</li> + <li><strong>ã¿ã¤ã ã¹ã¿ã³ãï¼UNIXã¨ããã¯ããã®çµéæéï¼</strong>: åä½ä»ã64-bitæ´æ°ã®ã¿ã¤ã ã¾ã¼ã³ä»ãã¨ã¿ã¤ã ã¾ã¼ã³ãªã</li> + <li><strong>åºå®é·ãã¤ããªã¼</strong>: 決ã¾ã£ããã¤ãæ°ã®ããªããã£ããªå¤</li> + <li><strong>åºå®é·ãªã¹ã</strong>: åè¦ç´ ãåããµã¤ãºã®ãªã¹ãï¼è¦ç´ ã®ãã¯ã¿ã¼ã¨ã¯å¥ã«ãªãã»ããã®ãã¯ã¿ã¼ãæã¤å¿ è¦ããªãï¼</li> +</ul> + +<p>C++ã®Arrowå®è£ ã§ã¯ã<a href="https://github.com/boostorg/multiprecision">Boost.Multiprecision</a>ã使ã£ãexactãªå°æ°ã®ãµãã¼ããå®é¨çã«è¿½å ãã¾ããããã ããJavaå®è£ ã¨C++å®è£ éã§ã®å°æ°ã®ã¡ã¢ãªã¼ãã©ã¼ãããã¯ã¾ã åºã¾ã£ã¦ãã¾ããã</p> + +<h3 id="cã¨pythonã®windowsãµãã¼ã">C++ã¨Pythonã®Windowsãµãã¼ã</h3> + +<p>ä¸è¬çãªC++ã¨Pythonã§ã®éçºç¨ã«ãããã±ã¼ã¸å¨ãã®æ¹è¯ã夿°å ¥ã£ã¦ãã¾ãã0.3.0ã¯Visual Studioï¼MSVCï¼2015ã¨2017ã使ã£ã¦Windowsãå®å ¨ã«ãµãã¼ãããæåã®ãã¼ã¸ã§ã³ã§ããAppveyorã§MSVCç¨ã®CIãå®è¡ãã¦ãã¾ããWindowsä¸ã§ã½ã¼ã¹ãããã«ãããããã®ã¬ã¤ããæ¸ãã¾ããã<a href="https://github.com/apache/arrow/blob/master/cpp/apidoc/Windows.md">C++</a>ç¨ã¨<a href="https://github.com/apache/arrow/blob/master/python/doc/source/development.rst">Python</a>ç¨ã</p> + +<p><a href="https://conda-forge.github.io">conda-forge</a>ããWindowsç¨ã®Arrowã®Pythonã©ã¤ãã©ãªã¼ãã¤ã³ã¹ãã¼ã«ã§ãã¾ãã</p> + +<div class="language-shell highlighter-rouge"><pre class="highlight"><code>conda install pyarrow -c conda-forge +</code></pre> +</div> + +<h3 id="cglibãã¤ã³ãã£ã³ã°ã¨rubyluaä»ã®ãµãã¼ã">Cï¼GLibï¼ãã¤ã³ãã£ã³ã°ã¨Rubyã»Luaã»ä»ã®ãµãã¼ã</h3> + +<p><a href="http://github.com/kou">Kouhei Sutou</a>ã¯æ°ããApache Arrowã®ã³ã³ããªãã¥ã¼ã¿ã¼ã§ããLinuxç¨ã®ï¼Arrowã®C++å®è£ ã®ï¼GLibã使ã£ãCãã¤ã³ãã£ã³ã°ãã³ã³ããªãã¥ã¼ããã¾ããã<a href="https://wiki.gnome.org/Projects/GObjectIntrospection">GObject Introspection</a>ã¨ããCã®ããã«ã¦ã§ã¢ã使ããã¨ã§RubyãLuaãGoã<a href="https://wiki.gnome.org/Projects/GObjectIntrospection/Users">ä»ã«ãæ§ã ãªããã°ã©ãã³ã°è¨èª</a>ã§ã·ã¼ã ã¬ã¹ã«ãã¤ã³ãã£ã³ã°ã使ããã¨ãã§ãã¾ãããããã®ãã¤ã³ãã£ã³ã°ãã©ã®ããã«åãã¦ãããããããã®ãã¤ã³ãã£ã³ã°ãã©ã®ããã«ä½¿ããã説æããããã°è¨äºãå¥éå¿ è¦ãªæ°ããã¾ãã</p> + +<h3 id="pysparkã使ã£ãapache-sparkã¨ã®é£æº">PySparkã使ã£ãApache Sparkã¨ã®é£æº</h3> + +<p><a href="https://issues.apache.org/jira/browse/SPARK-13534">SPARK-13534</a>ã§Apache Sparkã³ãã¥ããã£ã¼ã¨ååãã¦ãã¾ããPySparkã§ã®<code class="highlighter-rouge">DataFrame.toPandas</code>ãArrowã使ã£ã¦é«éåãããã¨ãã¦ãã¾ããå¹ççãªãã¼ã¿ã®ã·ãªã¢ã©ã¤ãºã«ãã<a href="https://github.com/apache/spark/pull/15821#issuecomment-282175163"><strong>40å以ä¸é«éå</strong></a>ã§ããã±ã¼ã¹ãããã¾ãã</p> + +<p>PySparkã§Arrowã使ããã¨ã§ããã¾ã§ã§ããªãã£ãããã©ã¼ãã³ã¹æé©åã®éãéãã¾ãããç¹ã«ãUDFã®è©ä¾¡ã¾ããã§ããããããããã¨ãããã§ããããï¼ãã¨ãã°ãPythonã®ã©ã ã颿°ã使ã£ã¦<code class="highlighter-rouge">map</code>ã»<code class="highlighter-rouge">filter</code>ãå®è¡ããã±ã¼ã¹ãï¼</p> + +<h3 id="pythonå®è£ ã§ã®æ°ããæ©è½ã¡ã¢ãªã¼ãã¥ã¼featherapache-parquetã®ãµãã¼ã">Pythonå®è£ ã§ã®æ°ããæ©è½ï¼ã¡ã¢ãªã¼ãã¥ã¼ãFeatherãApache Parquetã®ãµãã¼ã</h3> + +<p>Arrowã®Pythonã©ã¤ãã©ãªã¼ã§ãã<code class="highlighter-rouge">pyarrow</code>ã¯<code class="highlighter-rouge">libarrow</code>ã¨<code class="highlighter-rouge">libarrow_python</code>ã¨ããC++ã©ã¤ãã©ãªã¼ã®Cythonãã¤ã³ãã£ã³ã°ã§ãã<code class="highlighter-rouge">pyarrow</code>ã¯NumPyã¨<a href="http://pandas.pydata.org">pandas</a>ã¨Pythonã®æ¨æºã©ã¤ãã©ãªã¼éã®ã·ã¼ã ã¬ã¹ãªé£æºãå®ç¾ãã¾ãã</p> + +<p>Arrowã®C++ã©ã¤ãã©ãªã¼ã§æãéè¦ãªãã®ã¯<code class="highlighter-rouge">arrow::Buffer</code>ãªãã¸ã§ã¯ãã§ããããã¯ã¡ã¢ãªã¼ãã¥ã¼ã管çãã¾ããã³ãã¼ãªãã®èªã¿è¾¼ã¿ã¨ã¹ã©ã¤ã¹ããµãã¼ããã¦ããç¹ãéè¦ã§ãã<a href="https://github.com/JeffKnupp">Jeff Knupp</a>ã¯Arrowã®ãããã¡ã¼ã¨Pythonã®ãããã¡ã¼ãããã³ã«ã¨memoryviewã®é£æºå¦çãã³ã³ããªãã¥ã¼ããã¾ãããããã«ããæ¬¡ã®ãããªãã¨ãã§ããããã«ãªãã¾ããã</p> + +<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">In</span> <span class="p">[</span><span class="mi">6</span><span class="p">]:</span> <span class="kn">import</span> <span class="nn">pyarrow</span> <span class="kn">as</span> <span class="nn">pa</span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">7</span><span class="p">]:</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">frombuffer</span><span class="p">(</span><span class="n">b</span><span class="s">'foobarbaz'</span><span class="p">)</span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="n">buf</span> +<span class="n">Out</span><span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="o"><</span><span class="n">pyarrow</span><span class="o">.</span><span class="n">_io</span><span class="o">.</span><span class="n">Buffer</span> <span class="n">at</span> <span class="mh">0x7f6c0a84b538</span><span class="o">></span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">9</span><span class="p">]:</span> <span class="n">memoryview</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span> +<span class="n">Out</span><span class="p">[</span><span class="mi">9</span><span class="p">]:</span> <span class="o"><</span><span class="n">memory</span> <span class="n">at</span> <span class="mh">0x7f6c0a8c5e88</span><span class="o">></span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">10</span><span class="p">]:</span> <span class="n">buf</span><span class="o">.</span><span class="n">to_pybytes</span><span class="p">()</span> +<span class="n">Out</span><span class="p">[</span><span class="mi">10</span><span class="p">]:</span> <span class="n">b</span><span class="s">'foobarbaz'</span> +</code></pre> +</div> + +<p>C++ã§ã®Parquetå®è£ ã§ãã<a href="https://github.com/apache/parquet-cpp">parquet-cpp</a>ã使ããã¨ã§å¤§å¹ ã«<a href="http://parquet.apache.org"><strong>Apache Parquet</strong></a>ãµãã¼ããæ¹è¯ãã¾ããããã¨ãã°ããã£ã¹ã¯ä¸ã«ãããHDFSä¸ã«ãããé¢ä¿ãªãããã¼ãã£ã·ã§ã³ããããã¼ã¿ã»ããããµãã¼ããã¾ããã<a href="https://github.com/dask/dask/commit/68f9e417924a985c1f2e2a587126833c70a2e9f4">Daskããã¸ã§ã¯ã</a>ã¯Arrowã使ã£ãParquetãµãã¼ããå®è£ ããæåã®ããã¸ã§ã¯ãã§ããDaskéçºè ã¨ã¯pandsãã¼ã¿ã忣å¦çããæèã§ããã«ååã§ãããã¨ã楽ãã¿ã«ãã¦ãã¾ãã</p> + +<p>pandasãæçãããããã«Arrowãæ¹è¯ãããã¨ãããã<a href="https://github.com/wesm/feather"><strong>Featherãã©ã¼ããã</strong></a>ã®å®è£ ããã¼ã¸ããã®ããã®1ã¤ã§ããFeatherãã©ã¼ãããã¯æ¬è³ªçã«ã¯Arrowã®ã©ã³ãã ã¢ã¯ã»ã¹ãã©ã¼ãããã®ç¹å¥ãªã±ã¼ã¹ã®1ã¤ã§ããArrowã®ã³ã¼ããã¼ã¹ã§Featherã®éçºãç¶ãã¾ãããã¨ãã°ãä»ã®Featherã¯Arrowã®Pythonãã¤ã³ãã£ã³ã°ã®ã¬ã¤ã¤ã¼ã使ããã¨ã§Pythonã®ãã¡ã¤ã«ãªãã¸ã§ã¯ããèªã¿æ¸ãã§ããããã«ãªã£ã¦ãã¾ãã</p> + +<p><code class="highlighter-rouge">DatetimeTZ</code>ã<code class="highlighter-rouge">Categorical</code>ã¨ãã£ãpandasåºæã®ãã¼ã¿åã®ã¡ããã¨ããï¼robustï¼ãµãã¼ããå®è£ ãã¾ããã</p> + +<h3 id="cã©ã¤ãã©ãªã¼ã§ã®ãã³ã½ã«ãµãã¼ã">C++ã©ã¤ãã©ãªã¼ã§ã®ãã³ã½ã«ãµãã¼ã</h3> + +<p>Apache Arrowã¯ã³ãã¼ãªãã§å ±æã¡ã¢ãªã¼ã管çãããã¼ã«ã¨ããå´é¢ãããã¾ããæ©æ¢°å¦ç¿ã¢ããªã±ã¼ã·ã§ã³ã®æèã§ãã®æ©è½ã¸ã®é¢å¿ãå¢ãã¦ãã¾ããUCãã¼ã¯ã¬ã¼æ ¡ã®<a href="https://rise.cs.berkeley.edu/">RISELab</a>ã®<a href="https://github.com/ray-project/ray">Rayããã¸ã§ã¯ã</a>ãæåã®ä¾ã§ãã</p> + +<p>æ©æ¢°å¦ç¿ã§ã¯ã¯ããã³ã½ã«ãã¨ãå¼ã°ãã夿¬¡å é åã¨ãããã¼ã¿æ§é ãæ±ãã¾ãããã®ãããªãã¼ã¿æ§é ã¯Arrowã®ã«ã©ã ãã©ã¼ãããããµãã¼ããã¦ãããã¼ã¿æ§é ã®ç¯å²ãè¶ ãã¦ãã¾ããä»åã®ã±ã¼ã¹ã§ã¯ã<a href="http://arrow.apache.org/docs/cpp/classarrow_1_1_tensor.html"><code class="highlighter-rouge">arrow::Tensor</code></a>ã¨ããC++ã®åã追å ã§å®è£ ãã¾ãããããã¯Arrowã®ã³ãã¼ãªãã®å ±æã¡ã¢ãªã¼æ©è½ãæ´»ç¨ãã¦å®è£ ãã¾ãããï¼ã¡ã¢ãªã¼ã®çåæéã®ç®¡çã«<code class="highlighter-rouge">arrow::Buffer</code>ã使ãã¾ãããï¼C++å®è£ ã§ã¯ãããããããå ±éã®IOã»ã¡ã¢ãªã¼ç®¡çãã¼ã«ã¨ãã¦Arrowãæ´»ç¨ã§ããããã«ããããã追å ã®ãã¼ã¿æ§é ãæä¾ããã¤ããã§ãã</p> + +<h3 id="javascripttypescriptå®è£ ã®éå§">JavaScriptï¼TypeScriptï¼å®è£ ã®éå§</h3> + +<p><a href="https://github.com/TheNeuralBit">Brian Hulette</a>ã¯NodeJSã¨Webãã©ã¦ã¶ã¼ä¸ã§åãã¢ããªã±ã¼ã·ã§ã³ã§ä½¿ãããã«<a href="https://github.com/apache/arrow/tree/master/js">TypeScript</a>ã§ã®Arrowã®å®è£ ãå§ãã¾ãããFlatBuffersãJavaScriptããã¡ã¼ã¹ãã¯ã©ã¹ã§ãµãã¼ããã¦ããã®ã§å®è£ ãæãã¾ãã</p> + +<h3 id="webãµã¤ãã¨éçºè ç¨ããã¥ã¡ã³ãã®æ¹è¯">Webãµã¤ãã¨éçºè ç¨ããã¥ã¡ã³ãã®æ¹è¯</h3> + +<p>0.2.0ããªãªã¼ã¹ãã¦ããããã¥ã¡ã³ãã¨ããã°ãå ¬éããããã«Webãµã¤ãã®ã·ã¹ãã ã<a href="https://jekyllrb.com">Jekyll</a>ãã¼ã¹ã§ä½ãã¾ãããKouhei Sutouã¯<a href="https://github.com/red-data-tools/jekyll-jupyter-notebook">Jekyll Jupyter Notebookãã©ã°ã¤ã³</a>ãä½ãã¾ãããããã«ããArrowã®Webãµã¤ãã®ã³ã³ãã³ããä½ãããã«Jupyterã使ããã¨ãã§ãã¾ãã</p> + +<p>Webãµã¤ãã«ã¯CãC++ãJavaãPythonã®APIããã¥ã¡ã³ããå ¬éãã¾ããããããã®ä¸ã«Arrowã使ãå§ããããã®æçãªæ å ±ãè¦ã¤ããããã§ãããã</p> + +<h3 id="ã³ã³ããªãã¥ã¼ã¿ã¼">ã³ã³ããªãã¥ã¼ã¿ã¼</h3> + +<p>ãã®ãªãªã¼ã¹ã«ããããã³ã³ããªãã¥ã¼ãããã¿ãªããã«æè¬ãã¾ãã</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>$ git shortlog -sn apache-arrow-0.2.0..apache-arrow-0.3.0 + 119 Wes McKinney + 55 Kouhei Sutou + 18 Uwe L. Korn + 17 Julien Le Dem + 9 Phillip Cloud + 6 Bryan Cutler + 5 Philipp Moritz + 5 Emilio Lahr-Vivaz + 4 Max Risuhin + 4 Johan Mabille + 4 Jeff Knupp + 3 Steven Phillips + 3 Miki Tebeka + 2 Leif Walsh + 2 Jeff Reback + 2 Brian Hulette + 1 Tsuyoshi Ozawa + 1 rvernica + 1 Nong Li + 1 Julien Lafaye + 1 Itai Incze + 1 Holden Karau + 1 Deepak Majeti +</code></pre> +</div> + + + + <hr/> +<footer class="footer"> + <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> + <p>© 2017 Apache Software Foundation</p> +</footer> + + </div> +</body> +</html> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/61e9ea7e/blog/2017/05/07/0.3-release/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/05/07/0.3-release/index.html b/blog/2017/05/07/0.3-release/index.html new file mode 100644 index 0000000..1b6e0f3 --- /dev/null +++ b/blog/2017/05/07/0.3-release/index.html @@ -0,0 +1,364 @@ +<!DOCTYPE html> +<html lang="en-US"> + <head> + <meta charset="UTF-8"> + <title>Apache Arrow Homepage</title> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="generator" content="Jekyll v3.4.3"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + + <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> + + <link href="/css/main.css" rel="stylesheet"> + <link href="/css/syntax.css" rel="stylesheet"> + <script src="https://code.jquery.com/jquery-3.2.1.min.js" + integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" + crossorigin="anonymous"></script> + <script src="/assets/javascripts/bootstrap.min.js"></script> + + <!-- Global Site Tag (gtag.js) - Google Analytics --> +<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script> +<script> + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments)}; + gtag('js', new Date()); + + gtag('config', 'UA-107500873-1'); +</script> + + + </head> + + + +<body class="wrap"> + <div class="container"> + <nav class="navbar navbar-default"> + <div class="container-fluid"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/">Apache Arrow™ </a> + </div> + + <!-- Collect the nav links, forms, and other content for toggling --> + <div class="collapse navbar-collapse" id="arrow-navbar"> + <ul class="nav navbar-nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Project Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/install/">Install</a></li> + <li><a href="/blog/">Blog</a></li> + <li><a href="/release/">Releases</a></li> + <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li> + <li><a href="https://github.com/apache/arrow">Source Code</a></li> + <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> + <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> + <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Specification<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/memory_layout.html">Memory Layout</a></li> + <li><a href="/docs/metadata.html">Metadata</a></li> + <li><a href="/docs/ipc.html">Messaging / IPC</a></li> + </ul> + </li> + + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Documentation<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/python">Python</a></li> + <li><a href="/docs/cpp">C++ API</a></li> + <li><a href="/docs/java">Java API</a></li> + <li><a href="/docs/c_glib">C GLib API</a></li> + </ul> + </li> + <!-- <li><a href="/blog">Blog</a></li> --> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">ASF Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="http://www.apache.org/">ASF Website</a></li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </li> + </ul> + <a href="http://www.apache.org/"> + <img style="float:right;" src="/img/asf_logo.svg" width="120px"/> + </a> + </div><!-- /.navbar-collapse --> + </div> + </nav> + + + <h2> + Apache Arrow 0.3.0 Release + <a href="/blog/2017/05/07/0.3-release/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 07 May 2017 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://wesmckinney.com"><i class="fa fa-user"></i> Wes McKinney (wesm)</a> + </div> + </div> + </div> + + <!-- + +--> + +<p>Translations: <a href="/blog/2017/05/07/0.3-release-japanese/">æ¥æ¬èª</a></p> + +<p>The Apache Arrow team is pleased to announce the 0.3.0 release of the +project. It is the product of an intense 10 weeks of development since the +0.2.0 release from this past February. It includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.3.0"><strong>306 resolved JIRAs</strong></a> +from <a href="https://github.com/apache/arrow/graphs/contributors"><strong>23 contributors</strong></a>.</p> + +<p>While we have added many new features to the different Arrow implementations, +one of the major development focuses in 2017 has been hardening the in-memory +format, type metadata, and messaging protocol to provide a <strong>stable, +production-ready foundation</strong> for big data applications. We are excited to be +collaborating with the <a href="http://spark.apache.org">Apache Spark</a> and <a href="http://www.geomesa.org/">GeoMesa</a> communities on +utilizing Arrow for high performance IO and in-memory data processing.</p> + +<p>See the <a href="http://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your platform.</p> + +<p>We will be publishing more information about the Apache Arrow roadmap as we +forge ahead with using Arrow to accelerate big data systems.</p> + +<p>We are looking for more contributors from within our existing communities and +from other communities (such as Go, R, or Julia) to get involved in Arrow +development.</p> + +<h3 id="file-and-streaming-format-hardening">File and Streaming Format Hardening</h3> + +<p>The 0.2.0 release brought with it the first iterations of the <strong>random access</strong> +and <strong>streaming</strong> Arrow wire formats. See the <a href="http://arrow.apache.org/docs/ipc.html">IPC specification</a> for +implementation details and <a href="http://wesmckinney.com/blog/arrow-streaming-columnar/">example blog post</a> with some use cases. These +provide low-overhead, zero-copy access to Arrow record batch payloads.</p> + +<p>In 0.3.0 we have solidified a number of small details with the binary format +and improved our integration and unit testing particularly in the Java, C++, +and Python libraries. Using the <a href="http://github.com/google/flatbuffers">Google Flatbuffers</a> project has helped with +adding new features to our metadata without breaking forward compatibility.</p> + +<p>We are not yet ready to make a firm commitment to strong forward compatibility +(in case we find something needs to change) in the binary format, but we will +make efforts between major releases to not make unnecessary +breakages. Contributions to the website and component user and API +documentation would also be most welcome.</p> + +<h3 id="dictionary-encoding-support">Dictionary Encoding Support</h3> + +<p><a href="https://github.com/elahrvivaz">Emilio Lahr-Vivaz</a> from the <a href="http://www.geomesa.org/">GeoMesa</a> project contributed Java support +for dictionary-encoded Arrow vectors. We followed up with C++ and Python +support (and <code class="highlighter-rouge">pandas.Categorical</code> integration). We have not yet implemented +full integration tests for dictionaries (for sending this data between C++ and +Java), but hope to achieve this in the 0.4.0 Arrow release.</p> + +<p>This common data representation technique for categorical data allows multiple +record batches to share a common âdictionaryâ, with the values in the batches +being represented as integers referencing the dictionary. This data is called +âcategoricalâ or âfactorâ in statistical languages, while in file formats like +Apache Parquet it is strictly used for data compression.</p> + +<h3 id="expanded-date-time-and-fixed-size-types">Expanded Date, Time, and Fixed Size Types</h3> + +<p>A notable omission from the 0.2.0 release was complete and integration-tested +support for the gamut of date and time types that occur in the wild. These are +needed for <a href="http://parquet.apache.org">Apache Parquet</a> and Apache Spark integration.</p> + +<ul> + <li><strong>Date</strong>: 32-bit (days unit) and 64-bit (milliseconds unit)</li> + <li><strong>Time</strong>: 64-bit integer with unit (second, millisecond, microsecond, nanosecond)</li> + <li><strong>Timestamp</strong>: 64-bit integer with unit, with or without timezone</li> + <li><strong>Fixed Size Binary</strong>: Primitive values occupying certain number of bytes</li> + <li><strong>Fixed Size List</strong>: List values with constant size (no separate offsets vector)</li> +</ul> + +<p>We have additionally added experimental support for exact decimals in C++ using +<a href="https://github.com/boostorg/multiprecision">Boost.Multiprecision</a>, though we have not yet hardened the Decimal memory +format between the Java and C++ implementations.</p> + +<h3 id="c-and-python-support-on-windows">C++ and Python Support on Windows</h3> + +<p>We have made many general improvements to development and packaging for general +C++ and Python development. 0.3.0 is the first release to bring full C++ and +Python support for Windows on Visual Studio (MSVC) 2015 and 2017. In addition +to adding Appveyor continuous integration for MSVC, we have also written guides +for building from source on Windows: <a href="https://github.com/apache/arrow/blob/master/cpp/apidoc/Windows.md">C++</a> and <a href="https://github.com/apache/arrow/blob/master/python/doc/source/development.rst">Python</a>.</p> + +<p>For the first time, you can install the Arrow Python library on Windows from +<a href="https://conda-forge.github.io">conda-forge</a>:</p> + +<div class="language-shell highlighter-rouge"><pre class="highlight"><code>conda install pyarrow -c conda-forge +</code></pre> +</div> + +<h3 id="c-glib-bindings-with-support-for-ruby-lua-and-more">C (GLib) Bindings, with support for Ruby, Lua, and more</h3> + +<p><a href="http://github.com/kou">Kouhei Sutou</a> is a new Apache Arrow contributor and has contributed GLib C +bindings (to the C++ libraries) for Linux. Using a C middleware framework +called <a href="https://wiki.gnome.org/Projects/GObjectIntrospection">GObject Introspection</a>, it is possible to use these bindings +seamlessly in Ruby, Lua, Go, and <a href="https://wiki.gnome.org/Projects/GObjectIntrospection/Users">other programming languages</a>. We will +probably need to publish some follow up blogs explaining how these bindings +work and how to use them.</p> + +<h3 id="apache-spark-integration-for-pyspark">Apache Spark Integration for PySpark</h3> + +<p>We have been collaborating with the Apache Spark community on <a href="https://issues.apache.org/jira/browse/SPARK-13534">SPARK-13534</a> +to add support for using Arrow to accelerate <code class="highlighter-rouge">DataFrame.toPandas</code> in +PySpark. We have observed over <a href="https://github.com/apache/spark/pull/15821#issuecomment-282175163"><strong>40x speedup</strong></a> from the more efficient +data serialization.</p> + +<p>Using Arrow in PySpark opens the door to many other performance optimizations, +particularly around UDF evaluation (e.g. <code class="highlighter-rouge">map</code> and <code class="highlighter-rouge">filter</code> operations with +Python lambda functions).</p> + +<h3 id="new-python-feature-memory-views-feather-apache-parquet-support">New Python Feature: Memory Views, Feather, Apache Parquet support</h3> + +<p>Arrowâs Python library <code class="highlighter-rouge">pyarrow</code> is a Cython binding for the <code class="highlighter-rouge">libarrow</code> and +<code class="highlighter-rouge">libarrow_python</code> C++ libraries, which handle inteoperability with NumPy, +<a href="http://pandas.pydata.org">pandas</a>, and the Python standard library.</p> + +<p>At the heart of Arrowâs C++ libraries is the <code class="highlighter-rouge">arrow::Buffer</code> object, which is a +managed memory view supporting zero-copy reads and slices. <a href="https://github.com/JeffKnupp">Jeff Knupp</a> +contributed integration between Arrow buffers and the Python buffer protocol +and memoryviews, so now code like this is possible:</p> + +<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">In</span> <span class="p">[</span><span class="mi">6</span><span class="p">]:</span> <span class="kn">import</span> <span class="nn">pyarrow</span> <span class="kn">as</span> <span class="nn">pa</span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">7</span><span class="p">]:</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">frombuffer</span><span class="p">(</span><span class="n">b</span><span class="s">'foobarbaz'</span><span class="p">)</span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="n">buf</span> +<span class="n">Out</span><span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="o"><</span><span class="n">pyarrow</span><span class="o">.</span><span class="n">_io</span><span class="o">.</span><span class="n">Buffer</span> <span class="n">at</span> <span class="mh">0x7f6c0a84b538</span><span class="o">></span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">9</span><span class="p">]:</span> <span class="n">memoryview</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span> +<span class="n">Out</span><span class="p">[</span><span class="mi">9</span><span class="p">]:</span> <span class="o"><</span><span class="n">memory</span> <span class="n">at</span> <span class="mh">0x7f6c0a8c5e88</span><span class="o">></span> + +<span class="n">In</span> <span class="p">[</span><span class="mi">10</span><span class="p">]:</span> <span class="n">buf</span><span class="o">.</span><span class="n">to_pybytes</span><span class="p">()</span> +<span class="n">Out</span><span class="p">[</span><span class="mi">10</span><span class="p">]:</span> <span class="n">b</span><span class="s">'foobarbaz'</span> +</code></pre> +</div> + +<p>We have significantly expanded <a href="http://parquet.apache.org"><strong>Apache Parquet</strong></a> support via the C++ +Parquet implementation <a href="https://github.com/apache/parquet-cpp">parquet-cpp</a>. This includes support for partitioned +datasets on disk or in HDFS. We added initial Arrow-powered Parquet support <a href="https://github.com/dask/dask/commit/68f9e417924a985c1f2e2a587126833c70a2e9f4">in +the Dask project</a>, and look forward to more collaborations with the Dask +developers on distributed processing of pandas data.</p> + +<p>With Arrowâs support for pandas maturing, we were able to merge in the +<a href="https://github.com/wesm/feather"><strong>Feather format</strong></a> implementation, which is essentially a special case of +the Arrow random access format. Weâll be continuing Feather development within +the Arrow codebase. For example, Feather can now read and write with Python +file objects using Arrowâs Python binding layer.</p> + +<p>We also implemented more robust support for pandas-specific data types, like +<code class="highlighter-rouge">DatetimeTZ</code> and <code class="highlighter-rouge">Categorical</code>.</p> + +<h3 id="support-for-tensors-and-beyond-in-c-library">Support for Tensors and beyond in C++ Library</h3> + +<p>There has been increased interest in using Apache Arrow as a tool for zero-copy +shared memory management for machine learning applications. A flagship example +is the <a href="https://github.com/ray-project/ray">Ray project</a> from the UC Berkeley <a href="https://rise.cs.berkeley.edu/">RISELab</a>.</p> + +<p>Machine learning deals in additional kinds of data structures beyond what the +Arrow columnar format supports, like multidimensional arrays aka âtensorsâ. As +such, we implemented the <a href="http://arrow.apache.org/docs/cpp/classarrow_1_1_tensor.html"><code class="highlighter-rouge">arrow::Tensor</code></a> C++ type which can utilize the +rest of Arrowâs zero-copy shared memory machinery (using <code class="highlighter-rouge">arrow::Buffer</code> for +managing memory lifetime). In C++ in particular, we will want to provide for +additional data structures utilizing common IO and memory management tools.</p> + +<h3 id="start-of-javascript-typescript-implementation">Start of JavaScript (TypeScript) Implementation</h3> + +<p><a href="https://github.com/TheNeuralBit">Brian Hulette</a> started developing an Arrow implementation in +<a href="https://github.com/apache/arrow/tree/master/js">TypeScript</a> for use in NodeJS and browser-side applications. We are +benefitting from Flatbuffersâ first class support for JavaScript.</p> + +<h3 id="improved-website-and-developer-documentation">Improved Website and Developer Documentation</h3> + +<p>Since 0.2.0 we have implemented a new website stack for publishing +documentation and blogs based on <a href="https://jekyllrb.com">Jekyll</a>. Kouhei Sutou developed a <a href="https://github.com/red-data-tools/jekyll-jupyter-notebook">Jekyll +Jupyter Notebook plugin</a> so that we can use Jupyter to author content for +the Arrow website.</p> + +<p>On the website, we have now published API documentation for the C, C++, Java, +and Python subcomponents. Within these you will find easier-to-follow developer +instructions for getting started.</p> + +<h3 id="contributors">Contributors</h3> + +<p>Thanks to all who contributed patches to this release.</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>$ git shortlog -sn apache-arrow-0.2.0..apache-arrow-0.3.0 + 119 Wes McKinney + 55 Kouhei Sutou + 18 Uwe L. Korn + 17 Julien Le Dem + 9 Phillip Cloud + 6 Bryan Cutler + 5 Philipp Moritz + 5 Emilio Lahr-Vivaz + 4 Max Risuhin + 4 Johan Mabille + 4 Jeff Knupp + 3 Steven Phillips + 3 Miki Tebeka + 2 Leif Walsh + 2 Jeff Reback + 2 Brian Hulette + 1 Tsuyoshi Ozawa + 1 rvernica + 1 Nong Li + 1 Julien Lafaye + 1 Itai Incze + 1 Holden Karau + 1 Deepak Majeti +</code></pre> +</div> + + + + <hr/> +<footer class="footer"> + <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> + <p>© 2017 Apache Software Foundation</p> +</footer> + + </div> +</body> +</html> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/61e9ea7e/blog/2017/05/22/0.4.0-release/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/05/22/0.4.0-release/index.html b/blog/2017/05/22/0.4.0-release/index.html new file mode 100644 index 0000000..a6d3406 --- /dev/null +++ b/blog/2017/05/22/0.4.0-release/index.html @@ -0,0 +1,225 @@ +<!DOCTYPE html> +<html lang="en-US"> + <head> + <meta charset="UTF-8"> + <title>Apache Arrow Homepage</title> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="generator" content="Jekyll v3.4.3"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + + <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> + + <link href="/css/main.css" rel="stylesheet"> + <link href="/css/syntax.css" rel="stylesheet"> + <script src="https://code.jquery.com/jquery-3.2.1.min.js" + integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" + crossorigin="anonymous"></script> + <script src="/assets/javascripts/bootstrap.min.js"></script> + + <!-- Global Site Tag (gtag.js) - Google Analytics --> +<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script> +<script> + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments)}; + gtag('js', new Date()); + + gtag('config', 'UA-107500873-1'); +</script> + + + </head> + + + +<body class="wrap"> + <div class="container"> + <nav class="navbar navbar-default"> + <div class="container-fluid"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/">Apache Arrow™ </a> + </div> + + <!-- Collect the nav links, forms, and other content for toggling --> + <div class="collapse navbar-collapse" id="arrow-navbar"> + <ul class="nav navbar-nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Project Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/install/">Install</a></li> + <li><a href="/blog/">Blog</a></li> + <li><a href="/release/">Releases</a></li> + <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li> + <li><a href="https://github.com/apache/arrow">Source Code</a></li> + <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> + <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> + <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Specification<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/memory_layout.html">Memory Layout</a></li> + <li><a href="/docs/metadata.html">Metadata</a></li> + <li><a href="/docs/ipc.html">Messaging / IPC</a></li> + </ul> + </li> + + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Documentation<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/python">Python</a></li> + <li><a href="/docs/cpp">C++ API</a></li> + <li><a href="/docs/java">Java API</a></li> + <li><a href="/docs/c_glib">C GLib API</a></li> + </ul> + </li> + <!-- <li><a href="/blog">Blog</a></li> --> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">ASF Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="http://www.apache.org/">ASF Website</a></li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </li> + </ul> + <a href="http://www.apache.org/"> + <img style="float:right;" src="/img/asf_logo.svg" width="120px"/> + </a> + </div><!-- /.navbar-collapse --> + </div> + </nav> + + + <h2> + Apache Arrow 0.4.0 Release + <a href="/blog/2017/05/22/0.4.0-release/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 22 May 2017 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://wesmckinney.com"><i class="fa fa-user"></i> Wes McKinney (wesm)</a> + </div> + </div> + </div> + + <!-- + +--> + +<p>The Apache Arrow team is pleased to announce the 0.4.0 release of the +project. While only 17 days since the release, it includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.4.0"><strong>77 resolved +JIRAs</strong></a> with some important new features and bug fixes.</p> + +<p>See the <a href="http://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your platform.</p> + +<h3 id="expanded-javascript-implementation">Expanded JavaScript Implementation</h3> + +<p>The TypeScript Arrow implementation has undergone some work since 0.3.0 and can +now read a substantial portion of the Arrow streaming binary format. As this +implementation develops, we will eventually want to include JS in the +integration test suite along with Java and C++ to ensure wire +cross-compatibility.</p> + +<h3 id="python-support-for-apache-parquet-on-windows">Python Support for Apache Parquet on Windows</h3> + +<p>With the <a href="https://github.com/apache/parquet-cpp/releases/tag/apache-parquet-cpp-1.1.0">1.1.0 C++ release</a> of <a href="http://parquet.apache.org">Apache Parquet</a>, we have enabled the +<code class="highlighter-rouge">pyarrow.parquet</code> extension on Windows for Python 3.5 and 3.6. This should +appear in conda-forge packages and PyPI in the near future. Developers can +follow the <a href="http://arrow.apache.org/docs/python/development.html">source build instructions</a>.</p> + +<h3 id="generalizing-arrow-streams">Generalizing Arrow Streams</h3> + +<p>In the 0.2.0 release, we defined the first version of the Arrow streaming +binary format for low-cost messaging with columnar data. These streams presume +that the message components are written as a continuous byte stream over a +socket or file.</p> + +<p>We would like to be able to support other other transport protocols, like +<a href="http://grpc.io/">gRPC</a>, for the message components of Arrow streams. To that end, in C++ we +defined an abstract stream reader interface, for which the current contiguous +streaming format is one implementation:</p> + +<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">class</span> <span class="nc">RecordBatchReader</span> <span class="p">{</span> + <span class="k">public</span><span class="o">:</span> + <span class="k">virtual</span> <span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">Schema</span><span class="o">></span> <span class="n">schema</span><span class="p">()</span> <span class="k">const</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> + <span class="k">virtual</span> <span class="n">Status</span> <span class="n">GetNextRecordBatch</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">RecordBatch</span><span class="o">>*</span> <span class="n">batch</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> +<span class="p">};</span></code></pre></figure> + +<p>It would also be good to define abstract stream reader and writer interfaces in +the Java implementation.</p> + +<p>In an upcoming blog post, we will explain in more depth how Arrow streams work, +but you can learn more about them by reading the <a href="http://arrow.apache.org/docs/ipc.html">IPC specification</a>.</p> + +<h3 id="c-and-cython-api-for-python-extensions">C++ and Cython API for Python Extensions</h3> + +<p>As other Python libraries with C or C++ extensions use Apache Arrow, they will +need to be able to return Python objects wrapping the underlying C++ +objects. In this release, we have implemented a prototype C++ API which enables +Python wrapper objects to be constructed from C++ extension code:</p> + +<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cp">#include "arrow/python/pyarrow.h" +</span> +<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">arrow</span><span class="o">::</span><span class="n">py</span><span class="o">::</span><span class="n">import_pyarrow</span><span class="p">())</span> <span class="p">{</span> + <span class="c1">// Error +</span><span class="p">}</span> + +<span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">arrow</span><span class="o">::</span><span class="n">RecordBatch</span><span class="o">></span> <span class="n">cpp_batch</span> <span class="o">=</span> <span class="n">GetData</span><span class="p">(...);</span> +<span class="n">PyObject</span><span class="o">*</span> <span class="n">py_batch</span> <span class="o">=</span> <span class="n">arrow</span><span class="o">::</span><span class="n">py</span><span class="o">::</span><span class="n">wrap_batch</span><span class="p">(</span><span class="n">cpp_batch</span><span class="p">);</span></code></pre></figure> + +<p>This API is intended to be usable from Cython code as well:</p> + +<figure class="highlight"><pre><code class="language-cython" data-lang="cython">cimport pyarrow +pyarrow.import_pyarrow()</code></pre></figure> + +<h3 id="python-wheel-installers-on-macos">Python Wheel Installers on macOS</h3> + +<p>With this release, <code class="highlighter-rouge">pip install pyarrow</code> works on macOS (OS X) as well as +Linux. We are working on providing binary wheel installers for Windows as well.</p> + + + + <hr/> +<footer class="footer"> + <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> + <p>© 2017 Apache Software Foundation</p> +</footer> + + </div> +</body> +</html> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/61e9ea7e/blog/2017/06/14/0.4.1-release/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/06/14/0.4.1-release/index.html b/blog/2017/06/14/0.4.1-release/index.html index 7bb6afa..733fcc8 100644 --- a/blog/2017/06/14/0.4.1-release/index.html +++ b/blog/2017/06/14/0.4.1-release/index.html @@ -64,6 +64,7 @@ <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> </ul> </li> <li class="dropdown"> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/61e9ea7e/blog/2017/06/16/turbodbc-arrow/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/06/16/turbodbc-arrow/index.html b/blog/2017/06/16/turbodbc-arrow/index.html index 0578000..1cdb16e 100644 --- a/blog/2017/06/16/turbodbc-arrow/index.html +++ b/blog/2017/06/16/turbodbc-arrow/index.html @@ -64,6 +64,7 @@ <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> </ul> </li> <li class="dropdown"> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/61e9ea7e/blog/2017/07/24/0.5.0-release/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/07/24/0.5.0-release/index.html b/blog/2017/07/24/0.5.0-release/index.html new file mode 100644 index 0000000..8e99201 --- /dev/null +++ b/blog/2017/07/24/0.5.0-release/index.html @@ -0,0 +1,235 @@ +<!DOCTYPE html> +<html lang="en-US"> + <head> + <meta charset="UTF-8"> + <title>Apache Arrow Homepage</title> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="generator" content="Jekyll v3.4.3"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + + <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> + + <link href="/css/main.css" rel="stylesheet"> + <link href="/css/syntax.css" rel="stylesheet"> + <script src="https://code.jquery.com/jquery-3.2.1.min.js" + integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" + crossorigin="anonymous"></script> + <script src="/assets/javascripts/bootstrap.min.js"></script> + + <!-- Global Site Tag (gtag.js) - Google Analytics --> +<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script> +<script> + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments)}; + gtag('js', new Date()); + + gtag('config', 'UA-107500873-1'); +</script> + + + </head> + + + +<body class="wrap"> + <div class="container"> + <nav class="navbar navbar-default"> + <div class="container-fluid"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/">Apache Arrow™ </a> + </div> + + <!-- Collect the nav links, forms, and other content for toggling --> + <div class="collapse navbar-collapse" id="arrow-navbar"> + <ul class="nav navbar-nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Project Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/install/">Install</a></li> + <li><a href="/blog/">Blog</a></li> + <li><a href="/release/">Releases</a></li> + <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li> + <li><a href="https://github.com/apache/arrow">Source Code</a></li> + <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> + <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> + <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Specification<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/memory_layout.html">Memory Layout</a></li> + <li><a href="/docs/metadata.html">Metadata</a></li> + <li><a href="/docs/ipc.html">Messaging / IPC</a></li> + </ul> + </li> + + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Documentation<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/python">Python</a></li> + <li><a href="/docs/cpp">C++ API</a></li> + <li><a href="/docs/java">Java API</a></li> + <li><a href="/docs/c_glib">C GLib API</a></li> + </ul> + </li> + <!-- <li><a href="/blog">Blog</a></li> --> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">ASF Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="http://www.apache.org/">ASF Website</a></li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </li> + </ul> + <a href="http://www.apache.org/"> + <img style="float:right;" src="/img/asf_logo.svg" width="120px"/> + </a> + </div><!-- /.navbar-collapse --> + </div> + </nav> + + + <h2> + Apache Arrow 0.5.0 Release + <a href="/blog/2017/07/24/0.5.0-release/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 24 Jul 2017 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://wesmckinney.com"><i class="fa fa-user"></i> Wes McKinney (wesm)</a> + </div> + </div> + </div> + + <!-- + +--> + +<p>The Apache Arrow team is pleased to announce the 0.5.0 release. It includes +<a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.5.0"><strong>130 resolved JIRAs</strong></a> with some new features, expanded integration +testing between implementations, and bug fixes. The Arrow memory format remains +stable since the 0.3.x and 0.4.x releases.</p> + +<p>See the <a href="http://arrow.apache.org/install">Install Page</a> to learn how to get the libraries for your +platform. The <a href="http://arrow.apache.org/release/0.5.0.html">complete changelog</a> is also available.</p> + +<h2 id="expanded-integration-testing">Expanded Integration Testing</h2> + +<p>In this release, we added compatibility tests for dictionary-encoded data +between Java and C++. This enables the distinct values (the <em>dictionary</em>) in a +vector to be transmitted as part of an Arrow schema while the record batches +contain integers which correspond to the dictionary.</p> + +<p>So we might have:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>data (string): ['foo', 'bar', 'foo', 'bar'] +</code></pre> +</div> + +<p>In dictionary-encoded form, this could be represented as:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>indices (int8): [0, 1, 0, 1] +dictionary (string): ['foo', 'bar'] +</code></pre> +</div> + +<p>In upcoming releases, we plan to complete integration testing for the remaining +data types (including some more complicated types like unions and decimals) on +the road to a 1.0.0 release in the future.</p> + +<h2 id="c-activity">C++ Activity</h2> + +<p>We completed a number of significant pieces of work in the C++ part of Apache +Arrow.</p> + +<h3 id="using-jemalloc-as-default-memory-allocator">Using jemalloc as default memory allocator</h3> + +<p>We decided to use <a href="https://github.com/jemalloc/jemalloc">jemalloc</a> as the default memory allocator unless it is +explicitly disabled. This memory allocator has significant performance +advantages in Arrow workloads over the default <code class="highlighter-rouge">malloc</code> implementation. We will +publish a blog post going into more detail about this and why you might care.</p> + +<h3 id="sharing-more-c-code-with-apache-parquet">Sharing more C++ code with Apache Parquet</h3> + +<p>We imported the compression library interfaces and dictionary encoding +algorithms from the <a href="http://github.com/apache/parquet-cpp">Apache Parquet C++ library</a>. The Parquet library now +depends on this code in Arrow, and we will be able to use it more easily for +data compression in Arrow use cases.</p> + +<p>As part of incorporating Parquetâs dictionary encoding utilities, we have +developed an <code class="highlighter-rouge">arrow::DictionaryBuilder</code> class to enable building +dictionary-encoded arrays iteratively. This can help save memory and yield +better performance when interacting with databases, Parquet files, or other +sources which may have columns having many duplicates.</p> + +<h3 id="support-for-lz4-and-zstd-compressors">Support for LZ4 and ZSTD compressors</h3> + +<p>We added LZ4 and ZSTD compression library support. In ARROW-300 and other +planned work, we intend to add some compression features for data sent via RPC.</p> + +<h2 id="python-activity">Python Activity</h2> + +<p>We fixed many bugs which were affecting Parquet and Feather users and fixed +several other rough edges with normal Arrow use. We also added some additional +Arrow type conversions: structs, lists embedded in pandas objects, and Arrow +time types (which deserialize to the <code class="highlighter-rouge">datetime.time</code> type).</p> + +<p>In upcoming releases we plan to continue to improve <a href="http://github.com/dask/dask">Dask</a> support and +performance for distributed processing of Apache Parquet files with pyarrow.</p> + +<h2 id="the-road-ahead">The Road Ahead</h2> + +<p>We have much work ahead of us to build out Arrow integrations in other data +systems to improve their processing performance and interoperability with other +systems.</p> + +<p>We are discussing the roadmap to a future 1.0.0 release on the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">developer +mailing list</a>. Please join the discussion there.</p> + + + + <hr/> +<footer class="footer"> + <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> + <p>© 2017 Apache Software Foundation</p> +</footer> + + </div> +</body> +</html> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/61e9ea7e/blog/2017/07/26/spark-arrow/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/07/26/spark-arrow/index.html b/blog/2017/07/26/spark-arrow/index.html index 425ffbf..e43a7da 100644 --- a/blog/2017/07/26/spark-arrow/index.html +++ b/blog/2017/07/26/spark-arrow/index.html @@ -64,6 +64,7 @@ <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> </ul> </li> <li class="dropdown"> @@ -173,7 +174,7 @@ the conversion to Arrow data can be done on the JVM and pushed back for the Spar executors to perform in parallel, drastically reducing the load on the driver.</p> <p>As of the merging of <a href="https://issues.apache.org/jira/browse/SPARK-13534">SPARK-13534</a>, the use of Arrow when calling <code class="highlighter-rouge">toPandas()</code> -needs to be enabled by setting the SQLConf âspark.sql.execution.arrow.enableâ to +needs to be enabled by setting the SQLConf âspark.sql.execution.arrow.enabledâ to âtrueâ. Letâs look at a simple usage example.</p> <div class="highlighter-rouge"><pre class="highlight"><code>Welcome to @@ -199,7 +200,7 @@ In [2]: %time pdf = df.toPandas() CPU times: user 17.4 s, sys: 792 ms, total: 18.1 s Wall time: 20.7 s -In [3]: spark.conf.set("spark.sql.execution.arrow.enable", "true") +In [3]: spark.conf.set("spark.sql.execution.arrow.enabled", "true") In [4]: %time pdf = df.toPandas() CPU times: user 40 ms, sys: 32 ms, total: 72 ms @@ -234,7 +235,7 @@ It is planned to add pyarrow as a pyspark dependency so that <p>Currently, the controlling SQLConf is disabled by default. This can be enabled programmatically as in the example above or by adding the line -âspark.sql.execution.arrow.enable=trueâ to <code class="highlighter-rouge">SPARK_HOME/conf/spark-defaults.conf</code>.</p> +âspark.sql.execution.arrow.enabled=trueâ to <code class="highlighter-rouge">SPARK_HOME/conf/spark-defaults.conf</code>.</p> <p>Also, not all Spark data types are currently supported and limited to primitive types. Expanded type support is in the works and expected to also be in the Spark http://git-wip-us.apache.org/repos/asf/arrow-site/blob/61e9ea7e/blog/2017/08/07/plasma-in-memory-object-store/index.html ---------------------------------------------------------------------- diff --git a/blog/2017/08/07/plasma-in-memory-object-store/index.html b/blog/2017/08/07/plasma-in-memory-object-store/index.html new file mode 100644 index 0000000..d2f25da --- /dev/null +++ b/blog/2017/08/07/plasma-in-memory-object-store/index.html @@ -0,0 +1,273 @@ +<!DOCTYPE html> +<html lang="en-US"> + <head> + <meta charset="UTF-8"> + <title>Apache Arrow Homepage</title> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="generator" content="Jekyll v3.4.3"> + <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags --> + <link rel="icon" type="image/x-icon" href="/favicon.ico"> + + <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900"> + + <link href="/css/main.css" rel="stylesheet"> + <link href="/css/syntax.css" rel="stylesheet"> + <script src="https://code.jquery.com/jquery-3.2.1.min.js" + integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=" + crossorigin="anonymous"></script> + <script src="/assets/javascripts/bootstrap.min.js"></script> + + <!-- Global Site Tag (gtag.js) - Google Analytics --> +<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script> +<script> + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments)}; + gtag('js', new Date()); + + gtag('config', 'UA-107500873-1'); +</script> + + + </head> + + + +<body class="wrap"> + <div class="container"> + <nav class="navbar navbar-default"> + <div class="container-fluid"> + <div class="navbar-header"> + <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar"> + <span class="sr-only">Toggle navigation</span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </button> + <a class="navbar-brand" href="/">Apache Arrow™ </a> + </div> + + <!-- Collect the nav links, forms, and other content for toggling --> + <div class="collapse navbar-collapse" id="arrow-navbar"> + <ul class="nav navbar-nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Project Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/install/">Install</a></li> + <li><a href="/blog/">Blog</a></li> + <li><a href="/release/">Releases</a></li> + <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li> + <li><a href="https://github.com/apache/arrow">Source Code</a></li> + <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing List</a></li> + <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li> + <li><a href="/committers/">Committers</a></li> + <li><a href="/powered_by/">Powered By</a></li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Specification<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/memory_layout.html">Memory Layout</a></li> + <li><a href="/docs/metadata.html">Metadata</a></li> + <li><a href="/docs/ipc.html">Messaging / IPC</a></li> + </ul> + </li> + + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">Documentation<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="/docs/python">Python</a></li> + <li><a href="/docs/cpp">C++ API</a></li> + <li><a href="/docs/java">Java API</a></li> + <li><a href="/docs/c_glib">C GLib API</a></li> + </ul> + </li> + <!-- <li><a href="/blog">Blog</a></li> --> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown" + role="button" aria-haspopup="true" + aria-expanded="false">ASF Links<span class="caret"></span> + </a> + <ul class="dropdown-menu"> + <li><a href="http://www.apache.org/">ASF Website</a></li> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </li> + </ul> + <a href="http://www.apache.org/"> + <img style="float:right;" src="/img/asf_logo.svg" width="120px"/> + </a> + </div><!-- /.navbar-collapse --> + </div> + </nav> + + + <h2> + Plasma In-Memory Object Store + <a href="/blog/2017/08/07/plasma-in-memory-object-store/" class="permalink" title="Permalink">â</a> + </h2> + + + + <div class="panel"> + <div class="panel-body"> + <div> + <span class="label label-default">Published</span> + <span class="published"> + <i class="fa fa-calendar"></i> + 07 Aug 2017 + </span> + </div> + <div> + <span class="label label-default">By</span> + <a href="http://people.apache.org/~Philipp Moritz and Robert Nishihara"><i class="fa fa-user"></i> (Philipp Moritz and Robert Nishihara)</a> + </div> + </div> + </div> + + <!-- + +--> + +<p><em><a href="https://people.eecs.berkeley.edu/~pcmoritz/">Philipp Moritz</a> and <a href="http://www.robertnishihara.com">Robert Nishihara</a> are graduate students at UC + Berkeley.</em></p> + +<h2 id="plasma-a-high-performance-shared-memory-object-store">Plasma: A High-Performance Shared-Memory Object Store</h2> + +<h3 id="motivating-plasma">Motivating Plasma</h3> + +<p>This blog post presents Plasma, an in-memory object store that is being +developed as part of Apache Arrow. <strong>Plasma holds immutable objects in shared +memory so that they can be accessed efficiently by many clients across process +boundaries.</strong> In light of the trend toward larger and larger multicore machines, +Plasma enables critical performance optimizations in the big data regime.</p> + +<p>Plasma was initially developed as part of <a href="https://github.com/ray-project/ray">Ray</a>, and has recently been moved +to Apache Arrow in the hopes that it will be broadly useful.</p> + +<p>One of the goals of Apache Arrow is to serve as a common data layer enabling +zero-copy data exchange between multiple frameworks. A key component of this +vision is the use of off-heap memory management (via Plasma) for storing and +sharing Arrow-serialized objects between applications.</p> + +<p><strong>Expensive serialization and deserialization as well as data copying are a +common performance bottleneck in distributed computing.</strong> For example, a +Python-based execution framework that wishes to distribute computation across +multiple Python âworkerâ processes and then aggregate the results in a single +âdriverâ process may choose to serialize data using the built-in <code class="highlighter-rouge">pickle</code> +library. Assuming one Python process per core, each worker process would have to +copy and deserialize the data, resulting in excessive memory usage. The driver +process would then have to deserialize results from each of the workers, +resulting in a bottleneck.</p> + +<p>Using Plasma plus Arrow, the data being operated on would be placed in the +Plasma store once, and all of the workers would read the data without copying or +deserializing it (the workers would map the relevant region of memory into their +own address spaces). The workers would then put the results of their computation +back into the Plasma store, which the driver could then read and aggregate +without copying or deserializing the data.</p> + +<h3 id="the-plasma-api">The Plasma API:</h3> + +<p>Below we illustrate a subset of the API. The C++ API is documented more fully +<a href="https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md">here</a>, and the Python API is documented <a href="https://github.com/apache/arrow/blob/master/python/doc/source/plasma.rst">here</a>.</p> + +<p><strong>Object IDs:</strong> Each object is associated with a string of bytes.</p> + +<p><strong>Creating an object:</strong> Objects are stored in Plasma in two stages. First, the +object store <em>creates</em> the object by allocating a buffer for it. At this point, +the client can write to the buffer and construct the object within the allocated +buffer. When the client is done, the client <em>seals</em> the buffer making the object +immutable and making it available to other Plasma clients.</p> + +<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># Create an object.</span> +<span class="n">object_id</span> <span class="o">=</span> <span class="n">pyarrow</span><span class="o">.</span><span class="n">plasma</span><span class="o">.</span><span class="n">ObjectID</span><span class="p">(</span><span class="mi">20</span> <span class="o">*</span> <span class="n">b</span><span class="s">'a'</span><span class="p">)</span> +<span class="n">object_size</span> <span class="o">=</span> <span class="mi">1000</span> +<span class="nb">buffer</span> <span class="o">=</span> <span class="n">memoryview</span><span class="p">(</span><span class="n">client</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">object_id</span><span class="p">,</span> <span class="n">object_size</span><span class="p">))</span> + +<span class="c"># Write to the buffer.</span> +<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span> + <span class="nb">buffer</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span> + +<span class="c"># Seal the object making it immutable and available to other clients.</span> +<span class="n">client</span><span class="o">.</span><span class="n">seal</span><span class="p">(</span><span class="n">object_id</span><span class="p">)</span> +</code></pre> +</div> + +<p><strong>Getting an object:</strong> After an object has been sealed, any client who knows the +object ID can get the object.</p> + +<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># Get the object from the store. This blocks until the object has been sealed.</span> +<span class="n">object_id</span> <span class="o">=</span> <span class="n">pyarrow</span><span class="o">.</span><span class="n">plasma</span><span class="o">.</span><span class="n">ObjectID</span><span class="p">(</span><span class="mi">20</span> <span class="o">*</span> <span class="n">b</span><span class="s">'a'</span><span class="p">)</span> +<span class="p">[</span><span class="n">buff</span><span class="p">]</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">([</span><span class="n">object_id</span><span class="p">])</span> +<span class="nb">buffer</span> <span class="o">=</span> <span class="n">memoryview</span><span class="p">(</span><span class="n">buff</span><span class="p">)</span> +</code></pre> +</div> + +<p>If the object has not been sealed yet, then the call to <code class="highlighter-rouge">client.get</code> will block +until the object has been sealed.</p> + +<h3 id="a-sorting-application">A sorting application</h3> + +<p>To illustrate the benefits of Plasma, we demonstrate an <strong>11x speedup</strong> (on a +machine with 20 physical cores) for sorting a large pandas DataFrame (one +billion entries). The baseline is the built-in pandas sort function, which sorts +the DataFrame in 477 seconds. To leverage multiple cores, we implement the +following standard distributed sorting scheme.</p> + +<ul> + <li>We assume that the data is partitioned across K pandas DataFrames and that +each one already lives in the Plasma store.</li> + <li>We subsample the data, sort the subsampled data, and use the result to define +L non-overlapping buckets.</li> + <li>For each of the K data partitions and each of the L buckets, we find the +subset of the data partition that falls in the bucket, and we sort that +subset.</li> + <li>For each of the L buckets, we gather all of the K sorted subsets that fall in +that bucket.</li> + <li>For each of the L buckets, we merge the corresponding K sorted subsets.</li> + <li>We turn each bucket into a pandas DataFrame and place it in the Plasma store.</li> +</ul> + +<p>Using this scheme, we can sort the DataFrame (the data starts and ends in the +Plasma store), in 44 seconds, giving an 11x speedup over the baseline.</p> + +<h3 id="design">Design</h3> + +<p>The Plasma store runs as a separate process. It is written in C++ and is +designed as a single-threaded event loop based on the <a href="https://redis.io/">Redis</a> event loop library. +The plasma client library can be linked into applications. Clients communicate +with the Plasma store via messages serialized using <a href="https://google.github.io/flatbuffers/">Google Flatbuffers</a>.</p> + +<h3 id="call-for-contributions">Call for contributions</h3> + +<p>Plasma is a work in progress, and the API is currently unstable. Today Plasma is +primarily used in <a href="https://github.com/ray-project/ray">Ray</a> as an in-memory cache for Arrow serialized objects. +We are looking for a broader set of use cases to help refine Plasmaâs API. In +addition, we are looking for contributions in a variety of areas including +improving performance and building other language bindings. Please let us know +if you are interested in getting involved with the project.</p> + + + + <hr/> +<footer class="footer"> + <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> + <p>© 2017 Apache Software Foundation</p> +</footer> + + </div> +</body> +</html>
