This is an automated email from the ASF dual-hosted git repository.
jorisvandenbossche pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 2f9f892a00 GH-39196: [Python][Docs] Document the Arrow PyCapsule
protocol in the 'extending pyarrow' section of the Python docs (#39199)
2f9f892a00 is described below
commit 2f9f892a0075d990a1b42dc97a97d490b6b08345
Author: Joris Van den Bossche <[email protected]>
AuthorDate: Thu Dec 21 15:53:41 2023 +0100
GH-39196: [Python][Docs] Document the Arrow PyCapsule protocol in the
'extending pyarrow' section of the Python docs (#39199)
### Rationale for this change
While the Arrow PyCapsule protocol itself is defined in the specification
part of the docs, this PR adds a section about it in the Python user guide as
well (referring to the specification for most details), where users might
typically look for Python specific docs.
* Closes: #39196
Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
---
.../format/CDataInterface/PyCapsuleInterface.rst | 2 ++
docs/source/python/extending_types.rst | 32 ++++++++++++++++++++++
2 files changed, 34 insertions(+)
diff --git a/docs/source/format/CDataInterface/PyCapsuleInterface.rst
b/docs/source/format/CDataInterface/PyCapsuleInterface.rst
index 0c1a01d7c6..03095aa2e9 100644
--- a/docs/source/format/CDataInterface/PyCapsuleInterface.rst
+++ b/docs/source/format/CDataInterface/PyCapsuleInterface.rst
@@ -16,6 +16,8 @@
.. under the License.
+.. _arrow-pycapsule-interface:
+
=============================
The Arrow PyCapsule Interface
=============================
diff --git a/docs/source/python/extending_types.rst
b/docs/source/python/extending_types.rst
index ee92cebcb5..b7261005e6 100644
--- a/docs/source/python/extending_types.rst
+++ b/docs/source/python/extending_types.rst
@@ -21,6 +21,38 @@
Extending pyarrow
=================
+Controlling conversion to (Py)Arrow with the PyCapsule Interface
+----------------------------------------------------------------
+
+The :ref:`Arrow C data interface <c-data-interface>` allows moving Arrow data
between
+different implementations of Arrow. This is a generic, cross-language
interface not
+specific to Python, but for Python libraries this interface is extended with a
Python
+specific layer: :ref:`arrow-pycapsule-interface`.
+
+This Python interface ensures that different libraries that support the C Data
interface
+can export Arrow data structures in a standard way and recognize each other's
objects.
+
+If you have a Python library providing data structures that hold
Arrow-compatible data
+under the hood, you can implement the following methods on those objects:
+
+- ``__arrow_c_schema__`` for schema or type-like objects.
+- ``__arrow_c_array__`` for arrays and record batches (contiguous tables).
+- ``__arrow_c_stream__`` for chunked tables or streams of data.
+
+Those methods return `PyCapsule
<https://docs.python.org/3/c-api/capsule.html>`__
+objects, and more details on the exact semantics can be found in the
+:ref:`specification <arrow-pycapsule-interface>`.
+
+When your data structures have those methods defined, the PyArrow constructors
+(such as :func:`pyarrow.array` or :func:`pyarrow.table`) will recognize those
objects as
+supporting this protocol, and convert them to PyArrow data structures
zero-copy. And the
+same can be true for any other library supporting this protocol on ingesting
data.
+
+Similarly, if your library has functions that accept user-provided data, you
can add
+support for this protocol by checking for the presence of those methods, and
+therefore accept any Arrow data (instead of harcoding support for a specific
+Arrow producer such as PyArrow).
+
.. _arrow_array_protocol:
Controlling conversion to pyarrow.Array with the ``__arrow_array__`` protocol