This is an automated email from the ASF dual-hosted git repository.
jorisvandenbossche pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 22b975f4ca7 MINOR: Update docs/python/install.html with GH-41105 (#521)
22b975f4ca7 is described below
commit 22b975f4ca718883b472a78dc64933b8a7cc3586
Author: Bryce Mecum <[email protected]>
AuthorDate: Thu May 16 22:57:52 2024 -0800
MINOR: Update docs/python/install.html with GH-41105 (#521)
My attempt at updating docs/python/install.html with teh changes in
https://github.com/apache/arrow/pull/41135.
I generated the docs locally, copied the generated install.html into
arrow-site, and then only committed the hunks I know changed. I didn't
commit the entire changed file since the diff included many more
changes, some of which looked like they'd break the page.
---
docs/python/install.html | 115 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 115 insertions(+)
diff --git a/docs/python/install.html b/docs/python/install.html
index d012eceb315..124d1fdf796 100644
--- a/docs/python/install.html
+++ b/docs/python/install.html
@@ -1549,6 +1549,13 @@ Linux distributions. We strongly recommend using a
64-bit system.</p>
<div class="highlight-bash notranslate"><div
class="highlight"><pre><span></span>conda<span class="w"> </span>install<span
class="w"> </span>-c<span class="w"> </span>conda-forge<span class="w">
</span>pyarrow
</pre></div>
</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>While the <code class="docutils literal notranslate"><span
class="pre">pyarrow</span></code> <a class="reference external"
href="https://conda-forge.org/">conda-forge</a> package is
+the right choice for most users, both a minimal and maximal variant of the
+package exist, either of which may be better for your use case. See
+<a class="reference internal" href="#python-conda-differences"><span
class="std std-ref">Differences between conda-forge packages</span></a>.</p>
+</div>
</section>
<section id="using-pip">
<h2>Using Pip<a class="headerlink" href="#using-pip" title="Permalink to this
heading">#</a></h2>
@@ -1597,6 +1604,114 @@ a custom path to the database from Python:</p>
</div>
</section>
</section>
+<section id="differences-between-conda-forge-packages">
+<span id="python-conda-differences"></span><h2>Differences between conda-forge
packages<a class="headerlink" href="#differences-between-conda-forge-packages"
title="Permalink to this heading">#</a></h2>
+<p>On <a class="reference external"
href="https://conda-forge.org/">conda-forge</a>, PyArrow is published as three
+separate packages, each providing varying levels of functionality. This is in
+contrast to PyPi, where only a single PyArrow package is provided.</p>
+<p>The purpose of this split is to minimize the size of the installed package
for
+most users (<code class="docutils literal notranslate"><span
class="pre">pyarrow</span></code>), provide a smaller, minimal package for
specialized use
+cases (<code class="docutils literal notranslate"><span
class="pre">pyarrow-core</span></code>), while still providing a complete
package for users who
+require it (<code class="docutils literal notranslate"><span
class="pre">pyarrow-all</span></code>). What was historically <code
class="docutils literal notranslate"><span class="pre">pyarrow</span></code> on
+<a class="reference external" href="https://conda-forge.org/">conda-forge</a>
is now <code class="docutils literal notranslate"><span
class="pre">pyarrow-all</span></code>, though most
+users can continue using <code class="docutils literal notranslate"><span
class="pre">pyarrow</span></code>.</p>
+<p>The <code class="docutils literal notranslate"><span
class="pre">pyarrow-core</span></code> package includes the following
functionality:</p>
+<ul class="simple">
+<li><p><a class="reference internal" href="data.html#data"><span class="std
std-ref">Data Types and In-Memory Data Model</span></a></p></li>
+<li><p><a class="reference internal" href="compute.html#compute"><span
class="std std-ref">Compute Functions</span></a> (i.e., <code class="docutils
literal notranslate"><span class="pre">pyarrow.compute</span></code>)</p></li>
+<li><p><a class="reference internal" href="memory.html#io"><span class="std
std-ref">Memory and IO Interfaces</span></a></p></li>
+<li><p><a class="reference internal" href="ipc.html#ipc"><span class="std
std-ref">Streaming, Serialization, and IPC</span></a> (i.e., <code
class="docutils literal notranslate"><span
class="pre">pyarrow.ipc</span></code>)</p></li>
+<li><p><a class="reference internal" href="filesystems.html#filesystem"><span
class="std std-ref">Filesystem Interface</span></a> (i.e., <code
class="docutils literal notranslate"><span
class="pre">pyarrow.fs</span></code>. Note: It’s planned to move cloud
fileystems (i.e., <a class="reference internal"
href="filesystems.html#filesystem-s3"><span class="std std-ref">S3</span></a>,
<a class="reference internal" href="filesystems.html#filesystem-gcs"><span
class="std std-ref">GCS</span></a [...]
+<li><p>File formats: <a class="reference internal"
href="feather.html#feather"><span class="std std-ref">Arrow/Feather</span></a>,
<a class="reference internal" href="json.html#json"><span class="std
std-ref">JSON</span></a>, <a class="reference internal"
href="csv.html#py-csv"><span class="std std-ref">CSV</span></a>, <a
class="reference internal" href="orc.html#orc"><span class="std
std-ref">ORC</span></a> (but not Parquet)</p></li>
+</ul>
+<p>The <code class="docutils literal notranslate"><span
class="pre">pyarrow</span></code> package adds the following:</p>
+<ul class="simple">
+<li><p>Acero (i.e., <code class="docutils literal notranslate"><span
class="pre">pyarrow.acero</span></code>)</p></li>
+<li><p><a class="reference internal" href="dataset.html#dataset"><span
class="std std-ref">Tabular Datasets</span></a> (i.e., <code class="docutils
literal notranslate"><span class="pre">pyarrow.dataset</span></code>)</p></li>
+<li><p><a class="reference internal" href="parquet.html#parquet"><span
class="std std-ref">Parquet</span></a> (i.e., <code class="docutils literal
notranslate"><span class="pre">pyarrow.parquet</span></code>)</p></li>
+<li><p>Substrait (i.e., <code class="docutils literal notranslate"><span
class="pre">pyarrow.substrait</span></code>)</p></li>
+</ul>
+<p>Finally, <code class="docutils literal notranslate"><span
class="pre">pyarrow-all</span></code> adds:</p>
+<ul class="simple">
+<li><p><a class="reference internal" href="flight.html#flight"><span
class="std std-ref">Arrow Flight RPC</span></a> and Flight SQL (i.e., <code
class="docutils literal notranslate"><span
class="pre">pyarrow.flight</span></code>)</p></li>
+<li><p>Gandiva (i.e., <code class="docutils literal notranslate"><span
class="pre">pyarrow.gandiva</span></code>)</p></li>
+</ul>
+<p>The following table lists the functionality provided by each package and
may be
+useful when deciding to use one package over another or when
+<a class="reference internal" href="#python-conda-custom-selection"><span
class="std std-ref">Creating A Custom Selection</span></a>.</p>
+<table class="table">
+<tbody>
+<tr class="row-odd"><td><p>Component</p></td>
+<td><p>Package</p></td>
+<td><p>pyarrow-core</p></td>
+<td><p>pyarrow</p></td>
+<td><p>pyarrow-all</p></td>
+</tr>
+<tr class="row-even"><td><p>Core</p></td>
+<td><p>pyarrow-core</p></td>
+<td><p>✓</p></td>
+<td><p>✓</p></td>
+<td><p>✓</p></td>
+</tr>
+<tr class="row-odd"><td><p>Parquet</p></td>
+<td><p>libparquet</p></td>
+<td></td>
+<td><p>✓</p></td>
+<td><p>✓</p></td>
+</tr>
+<tr class="row-even"><td><p>Dataset</p></td>
+<td><p>libarrow-dataset</p></td>
+<td></td>
+<td><p>✓</p></td>
+<td><p>✓</p></td>
+</tr>
+<tr class="row-odd"><td><p>Acero</p></td>
+<td><p>libarrow-acero</p></td>
+<td></td>
+<td><p>✓</p></td>
+<td><p>✓</p></td>
+</tr>
+<tr class="row-even"><td><p>Substrait</p></td>
+<td><p>libarrow-substrait</p></td>
+<td></td>
+<td><p>✓</p></td>
+<td><p>✓</p></td>
+</tr>
+<tr class="row-odd"><td><p>Flight</p></td>
+<td><p>libarrow-flight</p></td>
+<td></td>
+<td></td>
+<td><p>✓</p></td>
+</tr>
+<tr class="row-even"><td><p>Flight SQL</p></td>
+<td><p>libarrow-flight-sql</p></td>
+<td></td>
+<td></td>
+<td><p>✓</p></td>
+</tr>
+<tr class="row-odd"><td><p>Gandiva</p></td>
+<td><p>libarrow-gandiva</p></td>
+<td></td>
+<td></td>
+<td><p>✓</p></td>
+</tr>
+</tbody>
+</table>
+<section id="creating-a-custom-selection">
+<span id="python-conda-custom-selection"></span><h3>Creating A Custom
Selection<a class="headerlink" href="#creating-a-custom-selection"
title="Permalink to this heading">#</a></h3>
+<p>If you know which components you need and want to control what’s installed,
you
+can create a custom selection of packages to include only the extra features
you
+need. For example, to install <code class="docutils literal notranslate"><span
class="pre">pyarrow-core</span></code> and add support for reading and
+writing Parquet, install <code class="docutils literal notranslate"><span
class="pre">libparquet</span></code> alongside <code class="docutils literal
notranslate"><span class="pre">pyarrow-core</span></code>:</p>
+<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span>conda<span class="w"> </span>install<span
class="w"> </span>-c<span class="w"> </span>conda-forge<span class="w">
</span>pyarrow-core<span class="w"> </span>libparquet
+</pre></div>
+</div>
+<p>Or if you wish to use <code class="docutils literal notranslate"><span
class="pre">pyarrow</span></code> but need support for Flight RPC:</p>
+<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span>conda<span class="w"> </span>install<span
class="w"> </span>-c<span class="w"> </span>conda-forge<span class="w">
</span>pyarrow<span class="w"> </span>libarrow-flight
+</pre></div>
+</div>
+</section>
+</section>
</section>