http://git-wip-us.apache.org/repos/asf/arrow-site/blob/6360599f/docs/python/data.html ---------------------------------------------------------------------- diff --git a/docs/python/data.html b/docs/python/data.html new file mode 100644 index 0000000..e16f145 --- /dev/null +++ b/docs/python/data.html @@ -0,0 +1,524 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> + + +<html xmlns="http://www.w3.org/1999/xhtml"> + <head> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> + + <title>In-Memory Data Model — pyarrow documentation</title> + + <link rel="stylesheet" href="_static/sphinxdoc.css" type="text/css" /> + <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> + + <script type="text/javascript"> + var DOCUMENTATION_OPTIONS = { + URL_ROOT: './', + VERSION: '', + COLLAPSE_INDEX: false, + FILE_SUFFIX: '.html', + HAS_SOURCE: true, + SOURCELINK_SUFFIX: '.txt' + }; + </script> + <script type="text/javascript" src="_static/jquery.js"></script> + <script type="text/javascript" src="_static/underscore.js"></script> + <script type="text/javascript" src="_static/doctools.js"></script> + <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> + <link rel="index" title="Index" href="genindex.html" /> + <link rel="search" title="Search" href="search.html" /> + <link rel="next" title="IPC: Fast Streaming and Serialization" href="ipc.html" /> + <link rel="prev" title="Memory and IO Interfaces" href="memory.html" /> + </head> + <body role="document"> + <div class="related" role="navigation" aria-label="related navigation"> + <h3>Navigation</h3> + <ul> + <li class="right" style="margin-right: 10px"> + <a href="genindex.html" title="General Index" + accesskey="I">index</a></li> + <li class="right" > + <a href="ipc.html" title="IPC: Fast Streaming and Serialization" + accesskey="N">next</a> |</li> + <li class="right" > + <a href="memory.html" title="Memory and IO Interfaces" + accesskey="P">previous</a> |</li> + <li class="nav-item nav-item-0"><a href="index.html">pyarrow documentation</a> »</li> + </ul> + </div> + <div class="sphinxsidebar" role="navigation" aria-label="main navigation"> + <div class="sphinxsidebarwrapper"> + <h3><a href="index.html">Table Of Contents</a></h3> + <ul> +<li><a class="reference internal" href="#">In-Memory Data Model</a><ul> +<li><a class="reference internal" href="#type-metadata">Type Metadata</a></li> +<li><a class="reference internal" href="#schemas">Schemas</a></li> +<li><a class="reference internal" href="#arrays">Arrays</a><ul> +<li><a class="reference internal" href="#dictionary-arrays">Dictionary Arrays</a></li> +</ul> +</li> +<li><a class="reference internal" href="#record-batches">Record Batches</a></li> +<li><a class="reference internal" href="#tables">Tables</a></li> +<li><a class="reference internal" href="#custom-schema-and-field-metadata">Custom Schema and Field Metadata</a></li> +</ul> +</li> +</ul> + + <h4>Previous topic</h4> + <p class="topless"><a href="memory.html" + title="previous chapter">Memory and IO Interfaces</a></p> + <h4>Next topic</h4> + <p class="topless"><a href="ipc.html" + title="next chapter">IPC: Fast Streaming and Serialization</a></p> + <div role="note" aria-label="source link"> + <h3>This Page</h3> + <ul class="this-page-menu"> + <li><a href="_sources/data.rst.txt" + rel="nofollow">Show Source</a></li> + </ul> + </div> +<div id="searchbox" style="display: none" role="search"> + <h3>Quick search</h3> + <form class="search" action="search.html" method="get"> + <div><input type="text" name="q" /></div> + <div><input type="submit" value="Go" /></div> + <input type="hidden" name="check_keywords" value="yes" /> + <input type="hidden" name="area" value="default" /> + </form> +</div> +<script type="text/javascript">$('#searchbox').show(0);</script> + </div> + </div> + + <div class="document"> + <div class="documentwrapper"> + <div class="bodywrapper"> + <div class="body" role="main"> + + <div class="section" id="in-memory-data-model"> +<span id="data"></span><h1>In-Memory Data Model<a class="headerlink" href="#in-memory-data-model" title="Permalink to this headline">¶</a></h1> +<p>Apache Arrow defines columnar array data structures by composing type metadata +with memory buffers, like the ones explained in the documentation on +<a class="reference internal" href="memory.html#io"><span class="std std-ref">Memory and IO</span></a>. These data structures are exposed in Python through +a series of interrelated classes:</p> +<ul class="simple"> +<li><strong>Type Metadata</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.DataType</span></code>, which describe a logical +array type</li> +<li><strong>Schemas</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.Schema</span></code>, which describe a named +collection of types. These can be thought of as the column types in a +table-like object.</li> +<li><strong>Arrays</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.Array</span></code>, which are atomic, contiguous +columnar data structures composed from Arrow Buffer objects</li> +<li><strong>Record Batches</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.RecordBatch</span></code>, which are a +collection of Array objects with a particular Schema</li> +<li><strong>Tables</strong>: Instances of <code class="docutils literal"><span class="pre">pyarrow.Table</span></code>, a logical table data structure in +which each column consists of one or more <code class="docutils literal"><span class="pre">pyarrow.Array</span></code> objects of the +same type.</li> +</ul> +<p>We will examine these in the sections below in a series of examples.</p> +<div class="section" id="type-metadata"> +<span id="data-types"></span><h2>Type Metadata<a class="headerlink" href="#type-metadata" title="Permalink to this headline">¶</a></h2> +<p>Apache Arrow defines language agnostic column-oriented data structures for +array data. These include:</p> +<ul class="simple"> +<li><strong>Fixed-length primitive types</strong>: numbers, booleans, date and times, fixed +size binary, decimals, and other values that fit into a given number</li> +<li><strong>Variable-length primitive types</strong>: binary, string</li> +<li><strong>Nested types</strong>: list, struct, and union</li> +<li><strong>Dictionary type</strong>: An encoded categorical type (more on this later)</li> +</ul> +<p>Each logical data type in Arrow has a corresponding factory function for +creating an instance of that type object in Python:</p> +<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">pyarrow</span> <span class="kn">as</span> <span class="nn">pa</span> + +<span class="gp">In [2]: </span><span class="n">t1</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">int32</span><span class="p">()</span> + +<span class="gp">In [3]: </span><span class="n">t2</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">string</span><span class="p">()</span> + +<span class="gp">In [4]: </span><span class="n">t3</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">binary</span><span class="p">()</span> + +<span class="gp">In [5]: </span><span class="n">t4</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">binary</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> + +<span class="gp">In [6]: </span><span class="n">t5</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">timestamp</span><span class="p">(</span><span class="s1">'ms'</span><span class="p">)</span> + +<span class="gp">In [7]: </span><span class="n">t1</span> +<span class="gh">Out[7]: </span><span class="go">DataType(int32)</span> + +<span class="gp">In [8]: </span><span class="k">print</span><span class="p">(</span><span class="n">t1</span><span class="p">)</span> +<span class="go">