bkietz commented on code in PR #37877:
URL: https://github.com/apache/arrow/pull/37877#discussion_r1340758467
##########
docs/source/format/Columnar.rst:
##########
@@ -401,11 +406,17 @@ This layout is adapted from TU Munich's `UmbraDB`_.
.. _variable-size-list-layout:
-Variable-size List Layout
--------------------------
+Variable-size List Layouts
+--------------------------
List is a nested type which is semantically similar to variable-size
-binary. It is defined by two buffers, a validity bitmap and an offsets
+binary. There are two list layout variations — "list" and "list-view" —
+and each variation can use either 32-bit or 64-bit offsets.
Review Comment:
```suggestion
and each variation can be delimited by either 32-bit or 64-bit integers.
```
(to be slightly more generic and include the list sizes here)
##########
docs/source/format/Columnar.rst:
##########
@@ -487,6 +498,103 @@ will be represented as follows: ::
|-------------------------------|-----------------------|
| 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | unspecified (padding) |
+ListView Layout
+~~~~~~~~~~~~~~~
+
+The ListView layout is defined by three buffers instead of just two:
+a validity bitmap, an offsets buffer, and an additional sizes buffer.
+The sizes have the same bit width as the offsets and both 32-bit and 64-bit
+signed integer options are supported. Like in the List layout, the offsets
+reference the child array.
+
+Rather then inferring list lengths from the offsets, the sizes buffer
+stores the length of each list in the array. This in turn allows offsets to be
+out of order. Elements of the child array do not have to be stored in the
+same order they logically appear in the list elements of the parent array.
+
+When a value is null, the corresponding offset and size can have arbitrary
+values. When size is 0, the corresponding offset can have an arbitrary value.
+If choosing a value is possible, we recommend setting offsets and sizes to 0 in
+these cases.
+
+A list-view type is specified like ``ListView<T>``, where ``T`` is any type
+(primitive or nested). In these examples we use 32-bit offsets where
+the 64-bit offset version would be denoted by ``LargeListView<T>``.
+
+**Example Layout: ``List<Int8>`` Array**
Review Comment:
```suggestion
**Example Layout: ``ListView<Int8>`` Array**
```
##########
docs/source/format/Columnar.rst:
##########
@@ -487,6 +498,103 @@ will be represented as follows: ::
|-------------------------------|-----------------------|
| 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | unspecified (padding) |
+ListView Layout
+~~~~~~~~~~~~~~~
+
+The ListView layout is defined by three buffers instead of just two:
+a validity bitmap, an offsets buffer, and an additional sizes buffer.
+The sizes have the same bit width as the offsets and both 32-bit and 64-bit
+signed integer options are supported. Like in the List layout, the offsets
+reference the child array.
+
+Rather then inferring list lengths from the offsets, the sizes buffer
+stores the length of each list in the array. This in turn allows offsets to be
+out of order. Elements of the child array do not have to be stored in the
+same order they logically appear in the list elements of the parent array.
+
+When a value is null, the corresponding offset and size can have arbitrary
+values. When size is 0, the corresponding offset can have an arbitrary value.
+If choosing a value is possible, we recommend setting offsets and sizes to 0 in
+these cases.
+
+A list-view type is specified like ``ListView<T>``, where ``T`` is any type
+(primitive or nested). In these examples we use 32-bit offsets where
+the 64-bit offset version would be denoted by ``LargeListView<T>``.
+
+**Example Layout: ``List<Int8>`` Array**
+
+We illustrate an example of ``ListView<Int8>`` with length 4 having values::
+
+ [[12, -7, 25], null, [0, -127, 127, 50], []]
+
+will have the following representation: ::
+
+ * Length: 4, Null count: 1
+ * Validity bitmap buffer:
+
+ | Byte 0 (validity bitmap) | Bytes 1-63 |
+ |--------------------------|-----------------------|
+ | 00001101 | 0 (padding) |
+
+ * Offsets buffer (int32)
+
+ | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-63
|
+
|------------|-------------|-------------|-------------|-----------------------|
+ | 0 | unspecified | 3 | unspecified | unspecified
(padding) |
+
+ * Sizes buffer (int32)
+
+ | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-63
|
+
|------------|-------------|-------------|-------------|-----------------------|
+ | 3 | unspecified | 4 | 0 | unspecified
(padding) |
+
+ * Values array (Int8array):
Review Comment:
```suggestion
* Values array (Int8Array):
```
##########
docs/source/format/Columnar.rst:
##########
@@ -487,6 +498,103 @@ will be represented as follows: ::
|-------------------------------|-----------------------|
| 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | unspecified (padding) |
+ListView Layout
+~~~~~~~~~~~~~~~
+
+The ListView layout is defined by three buffers instead of just two:
+a validity bitmap, an offsets buffer, and an additional sizes buffer.
+The sizes have the same bit width as the offsets and both 32-bit and 64-bit
+signed integer options are supported. Like in the List layout, the offsets
+reference the child array.
+
+Rather then inferring list lengths from the offsets, the sizes buffer
+stores the length of each list in the array. This in turn allows offsets to be
Review Comment:
```suggestion
Sizes and offsets have the identical bit width and both 32-bit and 64-bit
signed integer options are supported.
As in the List layout, the offsets encode the start position of each slot
in the child array. In contrast to the List layout, list lengths are stored
explicitly in the sizes buffer instead of inferred. This allows offsets to be
```
##########
docs/source/format/Columnar.rst:
##########
@@ -487,6 +498,103 @@ will be represented as follows: ::
|-------------------------------|-----------------------|
| 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | unspecified (padding) |
+ListView Layout
+~~~~~~~~~~~~~~~
+
+The ListView layout is defined by three buffers instead of just two:
Review Comment:
```suggestion
The ListView layout is defined by three buffers instead of List layout's two:
```
or maybe
```suggestion
The ListView layout is defined by three buffers:
```
to avoid the slight implication that List layout is deficient instead of
simply different
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]