This is an automated email from the ASF dual-hosted git repository.
paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git
The following commit(s) were added to refs/heads/main by this push:
new e034caf2 refactor(docs): Shuffle organization of sections to multiple
pages (#460)
e034caf2 is described below
commit e034caf2705ce07b440b8ae15565ef46965c76a3
Author: Dewey Dunnington <[email protected]>
AuthorDate: Thu May 9 10:54:38 2024 -0300
refactor(docs): Shuffle organization of sections to multiple pages (#460)
This PR does some housekeeping on the table of contents to better
organize the sections and accomodate additional Python content that no
longer fits on a single page.
---
docs/.gitignore | 4 +-
docs/README.md | 25 +-
docs/source/conf.py | 1 +
docs/source/getting-started/cpp.rst | 542 ---------------------
docs/source/getting-started/python.rst | 294 -----------
docs/source/getting-started/r.rst | 286 -----------
docs/source/reference/cpp.rst | 13 +-
docs/source/reference/index.rst | 2 +-
.../reference/{index.rst => python/advanced.rst} | 33 +-
.../{python.rst => python/array-stream.rst} | 8 +-
.../reference/{python.rst => python/array.rst} | 8 +-
.../reference/{python.rst => python/index.rst} | 9 +-
.../reference/{python.rst => python/schema.rst} | 8 +-
r/DESCRIPTION | 2 +-
14 files changed, 64 insertions(+), 1171 deletions(-)
diff --git a/docs/.gitignore b/docs/.gitignore
index 8d4d64a1..bd839834 100644
--- a/docs/.gitignore
+++ b/docs/.gitignore
@@ -17,5 +17,7 @@
_build/
*_generated.rst
-source/getting-started.rst
source/roadmap.rst
+source/getting-started/cpp.rst
+source/getting-started/r.rst
+source/getting-started/python.rst
diff --git a/docs/README.md b/docs/README.md
index f85e504c..921a21c2 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -21,26 +21,17 @@
Building the nanoarrow documentation requires [Python](https://python.org),
[R](https://r-project.org), [Doxygen](https://doxygen.nl), and
[pandoc](https://pandoc.org/). In addition, several Python and R packages are
required. You can install the Python dependencies using `pip install -r
requirements.txt` in this directory; you can install the R dependencies using
`R -e 'install.packages("pkgdown")`.
+The `ci/scripts/build-docs.sh` script (or the `docker compose run --rm docs`
compose service) can be used to run all steps at once, after which
`sphinx-build source _build/html` can be used to iterate on changes.
+
```bash
git clone https://github.com/apache/arrow-nanoarrow.git
-cd arrow-nanoarrow/docs
-
-# run doxygen for the C API
-pushd ../src/apidoc
-doxygen
-popd
-# run doxygen for the IPC extension
-pushd ../extensions/nanoarrow_ipc/src/apidoc
-doxygen
-popd
+# Usually easiest to start with one of the docs build scripts
+docker compose run --rm docs
+# or install prerequisites and run
+ci/scripts/build-docs.sh
-# copy the readme into rst so that we can include it from sphinx
-pandoc ../README.md --from markdown --to rst -s -o source/README_generated.rst
-
-# Run sphinx to generate the main site
+# Iterate on Sphinx documentation
+cd docs
sphinx-build source _build/html
-
-# Run pkgdown to generate R package documentation
-R -e 'pkgdown::build_site("../r")'
```
diff --git a/docs/source/conf.py b/docs/source/conf.py
index b962452c..d6bfa54d 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -96,6 +96,7 @@ html_theme_options = {
"version_match": get_version(),
},
"navbar_start": ["navbar-logo", "version-switcher"],
+ "navigation_with_keys": False,
}
html_context = {
diff --git a/docs/source/getting-started/cpp.rst
b/docs/source/getting-started/cpp.rst
deleted file mode 100644
index 7c6501c8..00000000
--- a/docs/source/getting-started/cpp.rst
+++ /dev/null
@@ -1,542 +0,0 @@
-.. raw:: html
-
- <!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
- -->
-
-Getting started with nanoarrow in C/C++
-=======================================
-
-This tutorial provides a short example of writing a C++ library that
-exposes an Arrow-based API and uses nanoarrow to implement a simple text
-file reader/writer. In general, nanoarrow can help you write a library
-or application that:
-
-- exposes an Arrow-based API to read from a data source or format,
-- exposes an Arrow-based API to write to a data source or format,
-- exposes one or more compute functions that operates on and produces
- data in the form of Arrow arrays, and/or
-- exposes an extension type implementation.
-
-Because Arrow has bindings in many languages, it means that you or
-others can easily bind or use your tool in higher-level runtimes like R,
-Java, C++, Python, Rust, Julia, Go, or Ruby, among others.
-
-The nanoarrow library is not the only way that an Arrow-based API can be
-implemented: Arrow C++, Rust, and Go are all excellent choices and can
-compile into static libraries that are C-linkable from other languages;
-however, existing Arrow implementations produce relatively large static
-libraries and can present complex build-time or run-time linking
-requirements depending on the implementation and features used. If the
-set of libraries you’re working with already provide the conveniences
-you require, nanoarrow may provide all the functionality you need.
-
-Now that we’ve talked about why you might want to build a library with
-nanoarrow…let’s build one!
-
-.. note::
- This tutorial also goes over some of the basic structure of writing a C++
library.
- If you already know how to do this, feel free to scroll to the code examples
provided
- below or take a look at the
- `final example project
<https://github.com/apache/arrow-nanoarrow/tree/main/examples/linesplitter>`__.
-The library
------------
-
-The library we’ll write in this tutorial is a simple text processing
-library that splits and reassembles lines of text. It will be able to:
-
-- Read text from a buffer into an ``ArrowArray`` as one element per
- line, and
-- Write elements of an ``ArrowArray`` into a buffer, inserting line
- breaks after every element.
-
-For the sake of argument, we’ll call it ``linesplitter``.
-
-The development environment
----------------------------
-
-There are many excellent IDEs that can be used to develop C and C++
-libraries. For this tutorial, we will use
-`VSCode <https://code.visualstudio.com/>`__ and
-`CMake <https://cmake.org/>`__. You’ll need both installed to follow
-along: VSCode can be downloaded from the official site for most
-platforms; CMake is typically installed via your favourite package
-manager (e.g., ``brew install cmake``, ``apt-get install cmake``,
-``dnf install cmake``, etc.). You will also need a C and C++ compiler:
-on MacOS these can be installed using ``xcode-select --install``; on
-Linux you will need the packages that provide ``gcc``, ``g++``, and
-``make`` (e.g., ``apt-get install build-essential``); on Windows you
-will need to install `Visual
-Studio <https://visualstudio.microsoft.com/downloads/>`__ and CMake from
-the official download pages.
-
-Once you have VSCode installed, ensure you have the **CMake Tools** and
-**C/C++** extensions installed. Once your environment is set up, create
-a folder called ``linesplitter`` and open it using **File -> Open
-Folder**.
-
-The interface
--------------
-
-We’ll expose the interface to our library as a header called
-``linesplitter.h``. To ensure the definitions are only included once in
-any given source file, we’ll add the following line at the top:
-
-.. code:: cpp
-
- #pragma once
-
-Then, we need the `Arrow C Data
-interface
<https://arrow.apache.org/docs/format/CDataInterface.html#structure-definitions>`__
-itself, since it provides the type definitions that are recognized by
-other Arrow implementations on which our API will be built. It’s
-designed to be copy and pasted in this way - there’s no need to put it
-in another file include something from another project.
-
-.. code:: cpp
-
- #include <stdint.h>
-
- #ifndef ARROW_C_DATA_INTERFACE
- #define ARROW_C_DATA_INTERFACE
-
- #define ARROW_FLAG_DICTIONARY_ORDERED 1
- #define ARROW_FLAG_NULLABLE 2
- #define ARROW_FLAG_MAP_KEYS_SORTED 4
-
- struct ArrowSchema {
- // Array type description
- const char* format;
- const char* name;
- const char* metadata;
- int64_t flags;
- int64_t n_children;
- struct ArrowSchema** children;
- struct ArrowSchema* dictionary;
-
- // Release callback
- void (*release)(struct ArrowSchema*);
- // Opaque producer-specific data
- void* private_data;
- };
-
- struct ArrowArray {
- // Array data description
- int64_t length;
- int64_t null_count;
- int64_t offset;
- int64_t n_buffers;
- int64_t n_children;
- const void** buffers;
- struct ArrowArray** children;
- struct ArrowArray* dictionary;
-
- // Release callback
- void (*release)(struct ArrowArray*);
- // Opaque producer-specific data
- void* private_data;
- };
-
- #endif // ARROW_C_DATA_INTERFACE
-
-Next, we’ll provide definitions for the functions we’ll implement below:
-
-.. code:: c
-
- // Builds an ArrowArray of type string that will contain one element for
each line
- // in src and places it into out.
- //
- // On success, returns {0, ""}; on error, returns {<errno code>, <error
message>}
- std::pair<int, std::string> linesplitter_read(const std::string& src,
- struct ArrowArray* out);
-
- // Concatenates all elements of a string ArrowArray inserting a newline
between
- // elements.
- //
- // On success, returns {0, <result>}; on error, returns {<errno code>,
<error message>}
- std::pair<int, std::string> linesplitter_write(struct ArrowArray* input);
-
-.. note::
- You may notice that we don't include or mention nanoarrow in any way in the
header
- that is exposed to users. Because nanoarrow is designed to be vendored and
is not
- distributed as a system library, it is not safe for users of your library to
- ``#include "nanoarrow.h"`` because it might conflict with another library
that does
- the same (with possibly a different version of nanoarrow).
-Arrow C data/nanoarrow interface basics
----------------------------------------
-
-Now that we’ve seen the functions we need to implement and the Arrow
-types exposed in the C data interface, let’s unpack a few basics about
-using the Arrow C data interface and a few conventions used in the
-nanoarrow implementation.
-
-First, let’s discuss the ``ArrowSchema`` and the ``ArrowArray``. You can
-think of an ``ArrowSchema`` as an expression of a data type, whereas an
-``ArrowArray`` is the data itself. These structures accommodate nested
-types: columns are encoded in the ``children`` member of each. You
-always need to know the data type of an ``ArrowArray`` before accessing
-its contents. In our case we only operate on arrays of one type
-(“string”) and document that in our interface; for functions that
-operate on more than one type of array you will need to accept an
-``ArrowSchema`` and inspect it (e.g., using nanoarrow’s helper
-functions).
-
-Second, let’s discuss error handling. You may have noticed in the
-function definitions above that we return ``int``, which is an
-errno-compatible error code or ``0`` to indicate success. Functions in
-nanoarrow that need to communicate more detailed error information
-accept an ``ArrowError*`` argument (which can be ``NULL`` if the caller
-does care about the extra information). Any nanoarrow function that
-might fail communicates errors in this way. To avoid verbose code like
-the following:
-
-.. code:: c
-
- int init_string_non_null(struct ArrowSchema* schema) {
- int code = ArrowSchemaInitFromType(&schema, NANOARROW_TYPE_STRING);
- if (code != NANOARROW_OK) {
- return code;
- }
-
- schema->flags &= ~ARROW_FLAG_NULLABLE;
- return NANOARROW_OK;
- }
-
-…you can use the ``NANOARROW_RETURN_NOT_OK()`` macro:
-
-.. code:: c
-
- int init_string_non_null(struct ArrowSchema* schema) {
- NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(&schema,
NANOARROW_TYPE_STRING));
- schema->flags &= ~ARROW_FLAG_NULLABLE;
- return NANOARROW_OK;
- }
-
-This works as long as your internal functions that use nanoarrow also
-return ``int`` and/or an ``ArrowError*`` argument. This usually means
-that there is an outer function that presents a more idiomatic interface
-(e.g., returning ``std::optional<>`` or throwing an exception) and an
-inner function that uses nanoarrow-style error handling. Embracing
-``NANOARROW_RETURN_NOT_OK()`` is key to happiness when using the
-nanoarrow library.
-
-Third, let’s discuss memory management. Because nanoarrow is implemented
-in C and provides a C interface, the library by default uses C-style
-memory management (i.e., if you allocate it, you clean it up). This is
-unnecessary when you have C++ at your disposal, so nanoarrow also
-provides a C++ header (``nanoarrow.hpp``) with
-``std::unique_ptr<>``-like wrappers around anything that requires
-explicit clean up. Whereas in C you might have to write code like this:
-
-.. code:: c
-
- struct ArrowSchema schema;
- struct ArrowArray array;
-
- // Ok: if this returns, array was not initialized
- NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(&schema,
NANOARROW_TYPE_STRING));
-
- // Verbose: if this fails, we need to release schema before returning
- // or it will leak.
- int code = ArrowArrayInitFromSchema(&array, &schema, NULL);
- if (code != NANOARROW_OK) {
- ArrowSchemaRelease(&schema);
- return code;
- }
-
-…using the ``nanoarrow.hpp`` types we can do:
-
-.. code:: cpp
-
- nanoarrow::UniqueSchema schema;
- nanoarrow::UniqueArray array;
-
- NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(schema.get(),
NANOARROW_TYPE_STRING));
- NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromSchema(array.get(), schema.get(),
NULL));
-
-Building the library
---------------------
-
-Our library implementation will live in ``linesplitter.cc``. Before
-writing the actual implementations, let’s add just enough to our project
-that we can build it using VSCode’s C/C++/CMake integration:
-
-.. code:: cpp
-
- #include <cerrno>
- #include <cstdint>
- #include <sstream>
- #include <string>
- #include <utility>
-
- #include "nanoarrow/nanoarrow.hpp"
-
- #include "linesplitter.h"
-
- std::pair<int, std::string> linesplitter_read(const std::string& src,
- struct ArrowArray* out) {
- return {ENOTSUP, ""};
- }
-
- std::pair<int, std::string> linesplitter_write(struct ArrowArray* input) {
- return {ENOTSUP, ""};
- }
-
-We also need a ``CMakeLists.txt`` file that tells CMake and VSCode what
-to build. CMake has a lot of options and can scale to coordinate very
-large projects; however we only need a few lines to leverage VSCode’s
-integration.
-
-.. code:: cmake
-
- project(linesplitter)
-
- set(CMAKE_CXX_STANDARD 11)
-
- include(FetchContent)
-
- FetchContent_Declare(
- nanoarrow
- URL
https://github.com/apache/arrow-nanoarrow/releases/download/apache-arrow-nanoarrow-0.2.0/apache-arrow-nanoarrow-0.2.0.tar.gz
- URL_HASH
SHA512=38a100ae5c36a33aa330010eb27b051cff98671e9c82fff22b1692bb77ae61bd6dc2a52ac6922c6c8657bd4c79a059ab26e8413de8169eeed3c9b7fdb216c817)
- FetchContent_MakeAvailable(nanoarrow)
-
- add_library(linesplitter linesplitter.cc)
- target_link_libraries(linesplitter PRIVATE nanoarrow)
-
-After saving ``CMakeLists.txt``, you may have to close and re-open the
-``linesplitter`` directory in VSCode to activate the CMake integration.
-From the command palette (i.e., Control/Command-Shift-P), choose
-**CMake: Build**. If all went well, you should see a few lines of output
-indicating progress towards building and linking ``linesplitter``.
-
-.. note::
- Depending on your version of CMake you might also see a few warnings. This
CMakeLists.txt
- is intentionally minimal and as such does not attempt to silence them.
-.. note::
- If you're not using VSCode, you can accomplish the equivalent task in in a
terminal
- with ``mkdir build && cd build && cmake .. && cmake --build .``.
-Building an ArrowArray
-----------------------
-
-The input for our ``linesplitter_read()`` function is an
-``std::string``, which we’ll iterate over and add each detected line as
-its own element. First, we’ll define a function for the core logic of
-detecting the number of characters until the next ``\n`` or
-end-of-string.
-
-.. code:: cpp
-
- static int64_t find_newline(const ArrowStringView& src) {
- for (int64_t i = 0; i < src.size_bytes; i++) {
- if (src.data[i] == '\n') {
- return i;
- }
- }
-
- return src.size_bytes;
- }
-
-The next function we’ll define is an internal function that uses
-nanoarrow-style error handling. This uses the ``ArrowArrayAppend*()``
-family of functions provided by nanoarrow to build the array:
-
-.. code:: cpp
-
- static int linesplitter_read_internal(const std::string& src, ArrowArray*
out,
- ArrowError* error) {
- nanoarrow::UniqueArray tmp;
- NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(tmp.get(),
NANOARROW_TYPE_STRING));
- NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(tmp.get()));
-
- ArrowStringView src_view = {src.data(), static_cast<int64_t>(src.size())};
- ArrowStringView line_view;
- int64_t next_newline = -1;
- while ((next_newline = find_newline(src_view)) >= 0) {
- line_view = {src_view.data, next_newline};
- NANOARROW_RETURN_NOT_OK(ArrowArrayAppendString(tmp.get(), line_view));
- src_view.data += next_newline + 1;
- src_view.size_bytes -= next_newline + 1;
- }
-
- NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(tmp.get(),
error));
-
- ArrowArrayMove(tmp.get(), out);
- return NANOARROW_OK;
- }
-
-Finally, we define a wrapper that corresponds to the outer function
-definition.
-
-.. code:: cpp
-
- std::pair<int, std::string> linesplitter_read(const std::string& src,
ArrowArray* out) {
- ArrowError error;
- int code = linesplitter_read_internal(src, out, &error);
- if (code != NANOARROW_OK) {
- return {code, std::string(ArrowErrorMessage(&error))};
- } else {
- return {NANOARROW_OK, ""};
- }
- }
-
-Reading an ArrowArray
----------------------
-
-The input for our ``linesplitter_write()`` function is an
-``ArrowArray*`` like the one we create in ``linesplitter_read()``. Just
-as nanoarrow provides helpers to build arrays, it also provides helpers
-to read them via the ``ArrowArrayView*()`` family of functions. Again,
-we first define an internal function that uses nanoarrow-style error
-handling:
-
-.. code:: cpp
-
- static int linesplitter_write_internal(ArrowArray* input,
std::stringstream& out,
- ArrowError* error) {
- nanoarrow::UniqueArrayView input_view;
- ArrowArrayViewInitFromType(input_view.get(), NANOARROW_TYPE_STRING);
- NANOARROW_RETURN_NOT_OK(ArrowArrayViewSetArray(input_view.get(), input,
error));
-
- ArrowStringView item;
- for (int64_t i = 0; i < input->length; i++) {
- if (ArrowArrayViewIsNull(input_view.get(), i)) {
- out << "\n";
- } else {
- item = ArrowArrayViewGetStringUnsafe(input_view.get(), i);
- out << std::string(item.data, item.size_bytes) << "\n";
- }
- }
-
- return NANOARROW_OK;
- }
-
-Then, provide an outer wrapper that corresponds to the outer function
-definition.
-
-.. code:: cpp
-
- std::pair<int, std::string> linesplitter_write(ArrowArray* input) {
- std::stringstream out;
- ArrowError error;
- int code = linesplitter_write_internal(input, out, &error);
- if (code != NANOARROW_OK) {
- return {code, std::string(ArrowErrorMessage(&error))};
- } else {
- return {NANOARROW_OK, out.str()};
- }
- }
-
-Testing
--------
-
-We have an implementation, but does it work? Unlike higher-level
-runtimes like R and Python, we can’t just open a prompt and type some
-code to find out. For C and C++ libraries, the
-`googletest <https://google.github.io/googletest/quickstart-cmake.html>`__
-framework provides a quick and easy way to do this that scales nicely as
-the complexity of your project grows.
-
-First, we’ll add a stub test and some CMake to get going. In
-``linesplitter_test.cc``, add the following:
-
-.. code:: cpp
-
- #include <gtest/gtest.h>
-
- #include "nanoarrow/nanoarrow.hpp"
-
- #include "linesplitter.h"
-
- TEST(Linesplitter, LinesplitterRoundtrip) {
- EXPECT_EQ(4, 4);
- }
-
-Then, add the following to your ``CMakeLists.txt``:
-
-.. code:: cmake
-
- FetchContent_Declare(
- googletest
- URL https://github.com/google/googletest/archive/refs/tags/v1.13.0.zip
- )
- FetchContent_MakeAvailable(googletest)
-
- enable_testing()
-
- add_executable(linesplitter_test linesplitter_test.cc)
- target_link_libraries(linesplitter_test linesplitter GTest::gtest_main)
-
- include(GoogleTest)
- gtest_discover_tests(linesplitter_test)
-
-After you’re done, build the project again using the **CMake: Build**
-command from the command palette. If all goes well, choose **CMake:
-Refresh Tests** and then **Test: Run All Tests** from the command
-palette to run them! You should see some output indicating that tests
-ran successfully, or you can use VSCode’s “Testing” panel to visually
-inspect which tests passed.
-
-.. note::
- If you're not using VSCode, you can accomplish the equivalent task in in a
terminal
- with ``cd build && ctest .``.
-Now we’re ready to fill in the test! Our two functions happen to round
-trip, so a useful first test might be to check.
-
-.. code:: cpp
-
- TEST(Linesplitter, LinesplitterRoundtrip) {
- nanoarrow::UniqueArray out;
- auto result = linesplitter_read("line1\nline2\nline3", out.get());
- ASSERT_EQ(result.first, 0);
- ASSERT_EQ(result.second, "");
-
- ASSERT_EQ(out->length, 3);
-
- nanoarrow::UniqueArrayView out_view;
- ArrowArrayViewInitFromType(out_view.get(), NANOARROW_TYPE_STRING);
- ASSERT_EQ(ArrowArrayViewSetArray(out_view.get(), out.get(), nullptr), 0);
- ArrowStringView item;
-
- item = ArrowArrayViewGetStringUnsafe(out_view.get(), 0);
- ASSERT_EQ(std::string(item.data, item.size_bytes), "line1");
-
- item = ArrowArrayViewGetStringUnsafe(out_view.get(), 1);
- ASSERT_EQ(std::string(item.data, item.size_bytes), "line2");
-
- item = ArrowArrayViewGetStringUnsafe(out_view.get(), 2);
- ASSERT_EQ(std::string(item.data, item.size_bytes), "line3");
-
-
- auto result2 = linesplitter_write(out.get());
- ASSERT_EQ(result2.first, 0);
- ASSERT_EQ(result2.second, "line1\nline2\nline3\n");
- }
-
-Writing tests in this way also opens up a relatively straightforward
-debug path via the **CMake: Set Debug target** and **CMake: Debug**
-commands. If the first thing that happens when you write run your test
-is a crash, running the tests with the debugger turned on will
-automatically pause at the line of code that caused the crash. For more
-fine-tuned debugging, you can set breakpoints and step through code.
-
-Summary
--------
-
-This tutorial covered the basics of writing and testing a C++ library
-exposing an Arrow-based API implemented using the nanoarrow C library.
diff --git a/docs/source/getting-started/python.rst
b/docs/source/getting-started/python.rst
deleted file mode 100644
index 2b909e13..00000000
--- a/docs/source/getting-started/python.rst
+++ /dev/null
@@ -1,294 +0,0 @@
-.. raw:: html
-
- <!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
- -->
-
-.. raw:: html
-
- <!-- Render with jupyter nbconvert --to markdown README.ipynb -->
-
-nanoarrow for Python
-====================
-
-The nanoarrow Python package provides bindings to the nanoarrow C
-library. Like the nanoarrow C library, it provides tools to facilitate
-the use of the `Arrow C
-Data <https://arrow.apache.org/docs/format/CDataInterface.html>`__ and
-`Arrow C
-Stream <https://arrow.apache.org/docs/format/CStreamInterface.html>`__
-interfaces.
-
-Installation
-------------
-
-Python bindings for nanoarrow are not yet available on PyPI. You can
-install via URL (requires a C compiler):
-
-.. code:: bash
-
- python -m pip install
"git+https://github.com/apache/arrow-nanoarrow.git#egg=nanoarrow&subdirectory=python"
-
-If you can import the namespace, you’re good to go!
-
-.. code:: python
-
- import nanoarrow as na
-
-Low-level C library bindings
-----------------------------
-
-The Arrow C Data and Arrow C Stream interfaces are comprised of three
-structures: the ``ArrowSchema`` which represents a data type of an
-array, the ``ArrowArray`` which represents the values of an array, and
-an ``ArrowArrayStream``, which represents zero or more ``ArrowArray``\ s
-with a common ``ArrowSchema``.
-
-Schemas
-~~~~~~~
-
-Use ``nanoarrow.c_schema()`` to convert an object to an ``ArrowSchema``
-and wrap it as a Python object. This works for any object implementing
-the `Arrow PyCapsule
-Interface <https://arrow.apache.org/docs/format/CDataInterface.html>`__
-(e.g., ``pyarrow.Schema``, ``pyarrow.DataType``, and ``pyarrow.Field``).
-
-.. code:: python
-
- import pyarrow as pa
- schema = na.c_schema(pa.decimal128(10, 3))
- schema
-
-::
-
- <nanoarrow.c_lib.CSchema decimal128(10, 3)>
- - format: 'd:10,3'
- - name: ''
- - flags: 2
- - metadata: NULL
- - dictionary: NULL
- - children[0]:
-
-You can extract the fields of a ``CSchema`` object one at a time or
-parse it into a view to extract deserialized parameters.
-
-.. code:: python
-
- na.c_schema_view(schema)
-
-::
-
- <nanoarrow.c_lib.CSchemaView>
- - type: 'decimal128'
- - storage_type: 'decimal128'
- - decimal_bitwidth: 128
- - decimal_precision: 10
- - decimal_scale: 3
-
-Advanced users can allocate an empty ``CSchema`` and populate its
-contents by passing its ``._addr()`` to a schema-exporting function.
-
-.. code:: python
-
- schema = na.allocate_c_schema()
- pa.int32()._export_to_c(schema._addr())
- schema
-
-::
-
- <nanoarrow.c_lib.CSchema int32>
- - format: 'i'
- - name: ''
- - flags: 2
- - metadata: NULL
- - dictionary: NULL
- - children[0]:
-
-The ``CSchema`` object cleans up after itself: when the object is
-deleted, the underlying ``ArrowSchema`` is released.
-
-Arrays
-~~~~~~
-
-You can use ``nanoarrow.c_array()`` to convert an array-like object to
-an ``ArrowArray``, wrap it as a Python object, and attach a schema that
-can be used to interpret its contents. This works for any object
-implementing the `Arrow PyCapsule
-Interface <https://arrow.apache.org/docs/format/CDataInterface.html>`__
-(e.g., ``pyarrow.Array``, ``pyarrow.RecordBatch``).
-
-.. code:: python
-
- array = na.c_array(pa.array(["one", "two", "three", None]))
- array
-
-::
-
- <nanoarrow.c_lib.CArray string>
- - length: 4
- - offset: 0
- - null_count: 1
- - buffers: (2939032895680, 2939032895616, 2939032895744)
- - dictionary: NULL
- - children[0]:
-
-You can extract the fields of a ``CArray`` one at a time or parse it
-into a view to extract deserialized content:
-
-.. code:: python
-
- na.c_array_view(array)
-
-::
-
- <nanoarrow.c_lib.CArrayView>
- - storage_type: 'string'
- - length: 4
- - offset: 0
- - null_count: 1
- - buffers[3]:
- - <bool validity[1 b] 11100000>
- - <int32 data_offset[20 b] 0 3 6 11 11>
- - <string data[11 b] b'onetwothree'>
- - dictionary: NULL
- - children[0]:
-
-Like the ``CSchema``, you can allocate an empty one and access its
-address with ``_addr()`` to pass to other array-exporting functions.
-
-.. code:: python
-
- array = na.allocate_c_array()
- pa.array([1, 2, 3])._export_to_c(array._addr(), array.schema._addr())
- array.length
-
-::
-
- 3
-
-Array streams
-~~~~~~~~~~~~~
-
-You can use ``nanoarrow.c_array_stream()`` to wrap an object
-representing a sequence of ``CArray``\ s with a common ``CSchema`` to an
-``ArrowArrayStream`` and wrap it as a Python object. This works for any
-object implementing the `Arrow PyCapsule
-Interface <https://arrow.apache.org/docs/format/CDataInterface.html>`__
-(e.g., ``pyarrow.RecordBatchReader``).
-
-.. code:: python
-
- pa_array_child = pa.array([1, 2, 3], pa.int32())
- pa_array = pa.record_batch([pa_array_child], names=["some_column"])
- reader = pa.RecordBatchReader.from_batches(pa_array.schema, [pa_array])
- array_stream = na.c_array_stream(reader)
- array_stream
-
-::
-
- <nanoarrow.c_lib.CArrayStream>
- - get_schema(): <nanoarrow.c_lib.CSchema struct>
- - format: '+s'
- - name: ''
- - flags: 0
- - metadata: NULL
- - dictionary: NULL
- - children[1]:
- 'some_column': <nanoarrow.c_lib.CSchema int32>
- - format: 'i'
- - name: 'some_column'
- - flags: 2
- - metadata: NULL
- - dictionary: NULL
- - children[0]:
-
-You can pull the next array from the stream using ``.get_next()`` or use
-it like an iterator. The ``.get_next()`` method will raise
-``StopIteration`` when there are no more arrays in the stream.
-
-.. code:: python
-
- for array in array_stream:
- print(array)
-
-::
-
- <nanoarrow.c_lib.CArray struct>
- - length: 3
- - offset: 0
- - null_count: 0
- - buffers: (0,)
- - dictionary: NULL
- - children[1]:
- 'some_column': <nanoarrow.c_lib.CArray int32>
- - length: 3
- - offset: 0
- - null_count: 0
- - buffers: (0, 2939033026688)
- - dictionary: NULL
- - children[0]:
-
-You can also get the address of a freshly-allocated stream to pass to a
-suitable exporting function:
-
-.. code:: python
-
- array_stream = na.allocate_c_array_stream()
- reader._export_to_c(array_stream._addr())
- array_stream
-
-::
-
- <nanoarrow.c_lib.CArrayStream>
- - get_schema(): <nanoarrow.c_lib.CSchema struct>
- - format: '+s'
- - name: ''
- - flags: 0
- - metadata: NULL
- - dictionary: NULL
- - children[1]:
- 'some_column': <nanoarrow.c_lib.CSchema int32>
- - format: 'i'
- - name: 'some_column'
- - flags: 2
- - metadata: NULL
- - dictionary: NULL
- - children[0]:
-
-Development
------------
-
-Python bindings for nanoarrow are managed with
-`setuptools <https://setuptools.pypa.io/en/latest/index.html>`__. This
-means you can build the project using:
-
-.. code:: shell
-
- git clone https://github.com/apache/arrow-nanoarrow.git
- cd arrow-nanoarrow/python
- pip install -e .
-
-Tests use `pytest <https://docs.pytest.org/>`__:
-
-.. code:: shell
-
- # Install dependencies
- pip install -e .[test]
-
- # Run tests
- pytest -vvx
diff --git a/docs/source/getting-started/r.rst
b/docs/source/getting-started/r.rst
deleted file mode 100644
index f69f1e07..00000000
--- a/docs/source/getting-started/r.rst
+++ /dev/null
@@ -1,286 +0,0 @@
-.. raw:: html
-
- <!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
- -->
-
-.. raw:: html
-
- <!-- README.md is generated from README.Rmd. Please edit that file -->
-
-nanoarrow
-=========
-
-.. raw:: html
-
- <!-- badges: start -->
-
-.. raw:: html
-
- <!-- badges: end -->
-
-The goal of nanoarrow is to provide minimal useful bindings to the
-`Arrow C
-Data <https://arrow.apache.org/docs/format/CDataInterface.html>`__ and
-`Arrow C
-Stream <https://arrow.apache.org/docs/format/CStreamInterface.html>`__
-interfaces using the `nanoarrow C
-library <https://arrow.apache.org/nanoarrow>`__.
-
-Installation
-------------
-
-You can install the released version of nanoarrow from
-`CRAN <https://cran.r-project.org/>`__ with:
-
-.. code:: r
-
- install.packages("nanoarrow")
-
-You can install the development version of nanoarrow from
-`GitHub <https://github.com/>`__ with:
-
-.. code:: r
-
- # install.packages("remotes")
- remotes::install_github("apache/arrow-nanoarrow/r")
-
-If you can load the package, you’re good to go!
-
-.. code:: r
-
- library(nanoarrow)
-
-Example
--------
-
-The Arrow C Data and Arrow C Stream interfaces are comprised of three
-structures: the ``ArrowSchema`` which represents a data type of an
-array, the ``ArrowArray`` which represents the values of an array, and
-an ``ArrowArrayStream``, which represents zero or more ``ArrowArray``\ s
-with a common ``ArrowSchema``. All three can be wrapped by R objects
-using the nanoarrow R package.
-
-Schemas
-~~~~~~~
-
-Use ``infer_nanoarrow_schema()`` to get the ArrowSchema object that
-corresponds to a given R vector type; use ``as_nanoarrow_schema()`` to
-convert an object from some other data type representation (e.g., an
-arrow R package ``DataType`` like ``arrow::int32()``); or use
-``na_XXX()`` functions to construct them.
-
-.. code:: r
-
- infer_nanoarrow_schema(1:5)
- #> <nanoarrow_schema int32>
- #> $ format : chr "i"
- #> $ name : chr ""
- #> $ metadata : list()
- #> $ flags : int 2
- #> $ children : list()
- #> $ dictionary: NULL
- as_nanoarrow_schema(arrow::schema(col1 = arrow::float64()))
- #> <nanoarrow_schema struct>
- #> $ format : chr "+s"
- #> $ name : chr ""
- #> $ metadata : list()
- #> $ flags : int 0
- #> $ children :List of 1
- #> ..$ col1:<nanoarrow_schema double>
- #> .. ..$ format : chr "g"
- #> .. ..$ name : chr "col1"
- #> .. ..$ metadata : list()
- #> .. ..$ flags : int 2
- #> .. ..$ children : list()
- #> .. ..$ dictionary: NULL
- #> $ dictionary: NULL
- na_int64()
- #> <nanoarrow_schema int64>
- #> $ format : chr "l"
- #> $ name : chr ""
- #> $ metadata : list()
- #> $ flags : int 2
- #> $ children : list()
- #> $ dictionary: NULL
-
-Arrays
-~~~~~~
-
-Use ``as_nanoarrow_array()`` to convert an object to an ArrowArray
-object:
-
-.. code:: r
-
- as_nanoarrow_array(1:5)
- #> <nanoarrow_array int32[5]>
- #> $ length : int 5
- #> $ null_count: int 0
- #> $ offset : int 0
- #> $ buffers :List of 2
- #> ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
- #> ..$ :<nanoarrow_buffer data<int32>[5][20 b]> `1 2 3 4 5`
- #> $ dictionary: NULL
- #> $ children : list()
- as_nanoarrow_array(data.frame(col1 = c(1.1, 2.2)))
- #> <nanoarrow_array struct[2]>
- #> $ length : int 2
- #> $ null_count: int 0
- #> $ offset : int 0
- #> $ buffers :List of 1
- #> ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
- #> $ children :List of 1
- #> ..$ col1:<nanoarrow_array double[2]>
- #> .. ..$ length : int 2
- #> .. ..$ null_count: int 0
- #> .. ..$ offset : int 0
- #> .. ..$ buffers :List of 2
- #> .. .. ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
- #> .. .. ..$ :<nanoarrow_buffer data<double>[2][16 b]> `1.1 2.2`
- #> .. ..$ dictionary: NULL
- #> .. ..$ children : list()
- #> $ dictionary: NULL
-
-You can use ``as.vector()`` or ``as.data.frame()`` to get the R
-representation of the object back:
-
-.. code:: r
-
- array <- as_nanoarrow_array(data.frame(col1 = c(1.1, 2.2)))
- as.data.frame(array)
- #> col1
- #> 1 1.1
- #> 2 2.2
-
-Even though at the C level the ArrowArray is distinct from the
-ArrowSchema, at the R level we attach a schema wherever possible. You
-can access the attached schema using ``infer_nanoarrow_schema()``:
-
-.. code:: r
-
- infer_nanoarrow_schema(array)
- #> <nanoarrow_schema struct>
- #> $ format : chr "+s"
- #> $ name : chr ""
- #> $ metadata : list()
- #> $ flags : int 0
- #> $ children :List of 1
- #> ..$ col1:<nanoarrow_schema double>
- #> .. ..$ format : chr "g"
- #> .. ..$ name : chr "col1"
- #> .. ..$ metadata : list()
- #> .. ..$ flags : int 2
- #> .. ..$ children : list()
- #> .. ..$ dictionary: NULL
- #> $ dictionary: NULL
-
-Array Streams
-~~~~~~~~~~~~~
-
-The easiest way to create an ArrowArrayStream is from a list of arrays
-or objects that can be converted to an array using
-``as_nanoarrow_array()``:
-
-.. code:: r
-
- stream <- basic_array_stream(
- list(
- data.frame(col1 = c(1.1, 2.2)),
- data.frame(col1 = c(3.3, 4.4))
- )
- )
-
-You can pull batches from the stream using the ``$get_next()`` method.
-The last batch will return ``NULL``.
-
-.. code:: r
-
- stream$get_next()
- #> <nanoarrow_array struct[2]>
- #> $ length : int 2
- #> $ null_count: int 0
- #> $ offset : int 0
- #> $ buffers :List of 1
- #> ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
- #> $ children :List of 1
- #> ..$ col1:<nanoarrow_array double[2]>
- #> .. ..$ length : int 2
- #> .. ..$ null_count: int 0
- #> .. ..$ offset : int 0
- #> .. ..$ buffers :List of 2
- #> .. .. ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
- #> .. .. ..$ :<nanoarrow_buffer data<double>[2][16 b]> `1.1 2.2`
- #> .. ..$ dictionary: NULL
- #> .. ..$ children : list()
- #> $ dictionary: NULL
- stream$get_next()
- #> <nanoarrow_array struct[2]>
- #> $ length : int 2
- #> $ null_count: int 0
- #> $ offset : int 0
- #> $ buffers :List of 1
- #> ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
- #> $ children :List of 1
- #> ..$ col1:<nanoarrow_array double[2]>
- #> .. ..$ length : int 2
- #> .. ..$ null_count: int 0
- #> .. ..$ offset : int 0
- #> .. ..$ buffers :List of 2
- #> .. .. ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
- #> .. .. ..$ :<nanoarrow_buffer data<double>[2][16 b]> `3.3 4.4`
- #> .. ..$ dictionary: NULL
- #> .. ..$ children : list()
- #> $ dictionary: NULL
- stream$get_next()
- #> NULL
-
-You can pull all the batches into a ``data.frame()`` by calling
-``as.data.frame()`` or ``as.vector()``:
-
-.. code:: r
-
- stream <- basic_array_stream(
- list(
- data.frame(col1 = c(1.1, 2.2)),
- data.frame(col1 = c(3.3, 4.4))
- )
- )
-
- as.data.frame(stream)
- #> col1
- #> 1 1.1
- #> 2 2.2
- #> 3 3.3
- #> 4 4.4
-
-After consuming a stream, you should call the release method as soon as
-you can. This lets the implementation of the stream release any
-resources (like open files) it may be holding in a more predictable way
-than waiting for the garbage collector to clean up the object.
-
-Integration with the arrow package
-----------------------------------
-
-The nanoarrow package implements ``as_nanoarrow_schema()``,
-``as_nanoarrow_array()``, and ``as_nanoarrow_array_stream()`` for most
-arrow package types. Similarly, it implements
-``arrow::as_arrow_array()``, ``arrow::as_record_batch()``,
-``arrow::as_arrow_table()``, ``arrow::as_record_batch_reader()``,
-``arrow::infer_type()``, ``arrow::as_data_type()``, and
-``arrow::as_schema()`` for nanoarrow objects such that you can pass
-equivalent nanoarrow objects into many arrow functions and vice versa.
diff --git a/docs/source/reference/cpp.rst b/docs/source/reference/cpp.rst
index 615262f4..7736800f 100644
--- a/docs/source/reference/cpp.rst
+++ b/docs/source/reference/cpp.rst
@@ -37,7 +37,14 @@ Array Stream utilities
.. doxygengroup:: nanoarrow_hpp-array-stream
:members:
-Base classes and utilities
---------------------------
-.. doxygengroup:: nanoarrow_hpp-unique_base
+Buffer utilities
+----------------
+
+.. doxygengroup:: nanoarrow_hpp-buffer
+ :members:
+
+Range-for utilities
+-------------------
+
+.. doxygengroup:: nanoarrow_hpp-range_for
:members:
diff --git a/docs/source/reference/index.rst b/docs/source/reference/index.rst
index 38c6c11d..d0b9fa93 100644
--- a/docs/source/reference/index.rst
+++ b/docs/source/reference/index.rst
@@ -22,7 +22,7 @@ API Reference
:maxdepth: 2
R API Reference <r>
- Python API Reference <python>
+ Python API Reference <python/index>
C API Reference <c>
C++ API Reference <cpp>
Testing API Reference <testing>
diff --git a/docs/source/reference/index.rst
b/docs/source/reference/python/advanced.rst
similarity index 71%
copy from docs/source/reference/index.rst
copy to docs/source/reference/python/advanced.rst
index 38c6c11d..631396b1 100644
--- a/docs/source/reference/index.rst
+++ b/docs/source/reference/python/advanced.rst
@@ -15,16 +15,23 @@
.. specific language governing permissions and limitations
.. under the License.
-API Reference
-=============
-
-.. toctree::
- :maxdepth: 2
-
- R API Reference <r>
- Python API Reference <python>
- C API Reference <c>
- C++ API Reference <cpp>
- Testing API Reference <testing>
- IPC Extension Reference <ipc>
- Device Extension Reference <device>
+Low-level Helpers
+=================
+
+C Schema Utilities
+------------------
+
+.. automodule:: nanoarrow.c_schema
+ :members:
+
+C Array Utilities
+-----------------
+
+.. automodule:: nanoarrow.c_array
+ :members:
+
+C ArrayStream Utilities
+-----------------------
+
+.. automodule:: nanoarrow.c_array_stream
+ :members:
diff --git a/docs/source/reference/python.rst
b/docs/source/reference/python/array-stream.rst
similarity index 86%
copy from docs/source/reference/python.rst
copy to docs/source/reference/python/array-stream.rst
index 94652478..451a6688 100644
--- a/docs/source/reference/python.rst
+++ b/docs/source/reference/python/array-stream.rst
@@ -15,8 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.
-Python API reference
-====================
+High-level ArrayStream Implementation
+=====================================
-.. automodule:: nanoarrow
- :members:
+.. automodule:: nanoarrow.array_stream
+ :members:
diff --git a/docs/source/reference/python.rst
b/docs/source/reference/python/array.rst
similarity index 86%
copy from docs/source/reference/python.rst
copy to docs/source/reference/python/array.rst
index 94652478..392c3241 100644
--- a/docs/source/reference/python.rst
+++ b/docs/source/reference/python/array.rst
@@ -15,8 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.
-Python API reference
-====================
+High-level ArrayStream Implementation
+=====================================
-.. automodule:: nanoarrow
- :members:
+.. automodule:: nanoarrow.array
+ :members:
diff --git a/docs/source/reference/python.rst
b/docs/source/reference/python/index.rst
similarity index 81%
copy from docs/source/reference/python.rst
copy to docs/source/reference/python/index.rst
index 94652478..7729a7c6 100644
--- a/docs/source/reference/python.rst
+++ b/docs/source/reference/python/index.rst
@@ -19,4 +19,11 @@ Python API reference
====================
.. automodule:: nanoarrow
- :members:
+
+.. toctree::
+ :maxdepth: 2
+
+ Schema/DataType Objects <schema>
+ High-level Array Implementation <array>
+ High-level ArrayStream Implementation <array-stream>
+ Low-level Helpers <advanced>
diff --git a/docs/source/reference/python.rst
b/docs/source/reference/python/schema.rst
similarity index 89%
rename from docs/source/reference/python.rst
rename to docs/source/reference/python/schema.rst
index 94652478..b52a20ed 100644
--- a/docs/source/reference/python.rst
+++ b/docs/source/reference/python/schema.rst
@@ -15,8 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.
-Python API reference
-====================
+Schema/Data Type Objects
+========================
-.. automodule:: nanoarrow
- :members:
+.. automodule:: nanoarrow.schema
+ :members:
diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 51a5c937..d4911ff5 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -20,7 +20,7 @@ License: Apache License (>= 2)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
-URL: https://github.com/apache/arrow-nanoarrow
+URL: https://arrow.apache.org/nanoarrow/latest/r/,
https://github.com/apache/arrow-nanoarrow
BugReports: https://github.com/apache/arrow-nanoarrow/issues
Suggests:
arrow (>= 9.0.0),