This is an automated email from the ASF dual-hosted git repository.

zanmato pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 5240670813 GH-46209: [Documentation][C++][Compute] Add cpp developer 
documentation for row table (#46210)
5240670813 is described below

commit 5240670813a2dac6386eb854f060384a3db946d1
Author: Rossi Sun <[email protected]>
AuthorDate: Mon May 12 10:27:58 2025 -0700

    GH-46209: [Documentation][C++][Compute] Add cpp developer documentation for 
row table (#46210)
    
    ### What changes are included in this PR?
    
    Add cpp developer documentation for row table, making it under the compute 
category.
    
    ### Are these changes tested?
    
    No need.
    
    ### Are there any user-facing changes?
    
    None.
    * GitHub Issue: #46209
    
    Lead-authored-by: Rossi Sun <[email protected]>
    Co-authored-by: Raúl Cumplido <[email protected]>
    Co-authored-by: Bryce Mecum <[email protected]>
    Signed-off-by: Rossi Sun <[email protected]>
---
 docs/source/cpp/index.rst              |  20 +++-
 docs/source/developers/cpp/compute.rst | 182 +++++++++++++++++++++++++++++++++
 docs/source/developers/cpp/index.rst   |   1 +
 3 files changed, 201 insertions(+), 2 deletions(-)

diff --git a/docs/source/cpp/index.rst b/docs/source/cpp/index.rst
index ee0434ac0f..c844ed2faa 100644
--- a/docs/source/cpp/index.rst
+++ b/docs/source/cpp/index.rst
@@ -96,11 +96,26 @@ Welcome to the Apache Arrow C++ implementation 
documentation!
 
          To the API Reference
 
-.. grid:: 1
+.. grid:: 1 2 2 2
    :gutter: 4
    :padding: 2 2 0 0
    :class-container: sd-text-center
 
+   .. grid-item-card:: C++ Development
+      :class-card: contrib-card
+      :shadow: none
+
+      Find guidelines and documentation for Arrow C++ developers
+
+      +++
+
+      .. button-link:: ../developers/cpp/index.html
+         :click-parent:
+         :color: primary
+         :expand:
+
+         To C++ Development
+
    .. grid-item-card:: Cookbook
       :class-card: contrib-card
       :shadow: none
@@ -126,4 +141,5 @@ Welcome to the Apache Arrow C++ implementation 
documentation!
    user_guide
    Examples <examples/index>
    api
-   C++ cookbook <https://arrow.apache.org/cookbook/cpp/>
+   C++ Development <../developers/cpp/index>
+   C++ Cookbook <https://arrow.apache.org/cookbook/cpp/>
diff --git a/docs/source/developers/cpp/compute.rst 
b/docs/source/developers/cpp/compute.rst
new file mode 100644
index 0000000000..21391ff5fb
--- /dev/null
+++ b/docs/source/developers/cpp/compute.rst
@@ -0,0 +1,182 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. highlight:: console
+.. _development-cpp-compute:
+
+============================
+Developing Arrow C++ Compute
+============================
+
+This section provides information for developers of the Arrow C++ Compute 
module.
+
+Row Table
+=========
+
+The row table in Arrow represents data stored in row-major format. This format
+is particularly useful for scenarios involving random access to individual rows
+and where all columns are frequently accessed together. It is especially
+advantageous for hash-table keys and facilitates efficient operations such as
+grouping and hash joins by optimizing memory access patterns and data locality.
+
+Metadata
+--------
+
+A row table is defined by its metadata, ``RowTableMetadata``, which includes
+information about its schema, alignment, and derived properties.
+
+The schema specifies the types and order of columns. Each row in the row table
+contains the data for each column in that logical order (the physical order may
+vary; see :ref:`row-encoding` for details).
+
+.. note::
+   Columns of nested types or large binary types are **not** supported in the
+   row table.
+
+One important property derived from the schema is whether the row table is
+fixed-length or varying-length. A fixed-length row table contains only
+fixed-length columns, while a varying-length row table includes at least one
+varying-length column. This distinction determines how data is stored and
+accessed in the row table.
+
+Each row in the row table is aligned to ``RowTableMetadata::row_alignment``
+bytes. Fixed-length columns with non-power-of-2 lengths are also aligned to
+``RowTableMetadata::row_alignment`` bytes. Varying-length columns are aligned 
to
+``RowTableMetadata::string_alignment`` bytes.
+
+Buffer Layout
+-------------
+
+Similar to most Arrow ``Array``\s, the row table consists of three buffers:
+
+- **Null Masks Buffer**: Indicates null values for each column in each row.
+- **Fixed-length Buffer**: Stores row data for fixed-length tables or offsets 
to
+  varying-length data for varying-length tables.
+- **Varying-length Buffer** (Optional): Contains row data for varying-length
+  tables; unused for fixed-length tables.
+
+Row Format
+----------
+
+Null Masks
+~~~~~~~~~~
+
+For each row, a contiguous sequence of bits represents whether each column in
+that row is null. Each bit corresponds to a specific column, with ``1``
+indicating the value is null and ``0`` indicating the value is valid. Note that
+this is the opposite of how the validity bitmap works for ``Array``\s. The null
+mask for a row occupies ``RowTableMetadata::null_masks_bytes_per_row`` bytes.
+
+Fixed-length Row Data
+~~~~~~~~~~~~~~~~~~~~~
+
+In a fixed-length row table, row data is directly stored in the fixed-length
+buffer. All columns in each row are stored sequentially. Notably, a ``boolean``
+column is special because, in a normal Arrow ``Array``, it is stored using 1
+bit, whereas in a row table, it occupies 1 byte. The varying-length buffer is
+not used in this case.
+
+For example, a row table with the schema ``(int32, boolean)`` and rows
+``[[7, false], [8, true], [9, false], ...]`` is stored in the fixed-length
+buffer as follows:
+
+.. list-table::
+   :header-rows: 1
+
+   * - Row 0
+     - Row 1
+     - Row 2
+     - ...
+   * - ``7 0 0 0, 0 (padding)``
+     - ``8 0 0 0, 1 (padding)``
+     - ``9 0 0 0, 0 (padding)``
+     - ...
+
+Offsets for Varying-length Row Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In a varying-length row table, the fixed-length buffer contains offsets to the
+varying-length row data, which is stored separately in the optional
+varying-length buffer. The offsets are of type 
``RowTableMetadata::offset_type``
+(fixed as ``int64_t``) and indicate the starting position of the row data for
+each row.
+
+Varying-length Row Data
+~~~~~~~~~~~~~~~~~~~~~~~
+
+In a varying-length row table, the varying-length buffer contains the actual 
row
+data, stored contiguously. The offsets in the fixed-length buffer point to the
+starting position of each row's data.
+
+.. _row-encoding:
+
+Row Encoding
+^^^^^^^^^^^^
+
+A varying-length row is encoded as follows:
+
+- Fixed-length columns are stored first.
+- A sequence of offsets to each varying-length column follows. Each offset is
+  32-bit and indicates the **end** position within the row data of the
+  corresponding varying-length column.
+- Varying-length columns are stored last.
+
+For example, a row table with the schema ``(int32, string, string, int32)`` and
+rows ``[[7, 'Alice', 'x', 0], [8, 'Bob', 'y', 1], [9, 'Charlotte', 'z', 2], 
...]``
+is stored as follows (assuming 8-byte alignment for varying-length columns):
+
+Fixed-length buffer (row offsets):
+
+.. list-table::
+   :header-rows: 1
+
+   * - Row 0
+     - Row 1
+     - Row 2
+     - Row 3
+     - ...
+   * - ``0 0 0 0 0 0 0 0``
+     - ``32 0 0 0 0 0 0 0``
+     - ``64 0 0 0 0 0 0 0``
+     - ``104 0 0 0 0 0 0 0``
+     - ...
+
+Varying-length buffer (row data):
+
+.. list-table::
+   :header-rows: 1
+
+   * - Row
+     - Fixed-length Cols
+     - Varying-length Offsets
+     - Varying-length Cols
+   * - 0
+     - ``7 0 0 0, 0 0 0 0``
+     - ``21 0 0 0, 25 0 0 0``
+     - ``Alice~~~x~~~~~~~``
+   * - 1
+     - ``8 0 0 0, 1 0 0 0``
+     - ``19 0 0 0, 25 0 0 0``
+     - ``Bob~~~~~y~~~~~~~``
+   * - 2
+     - ``9 0 0 0, 2 0 0 0``
+     - ``25 0 0 0, 33 0 0 0``
+     - ``Charlotte~~~~~~~z~~~~~~~``
+   * - 3
+     - ...
+     - ...
+     - ...
diff --git a/docs/source/developers/cpp/index.rst 
b/docs/source/developers/cpp/index.rst
index 603e1607dc..ec97d4a62a 100644
--- a/docs/source/developers/cpp/index.rst
+++ b/docs/source/developers/cpp/index.rst
@@ -30,3 +30,4 @@ C++ Development
    emscripten
    conventions
    fuzzing
+   compute

Reply via email to