Re: [PR] feat(python): Add array creation/building from buffers [arrow-nanoarrow]

via GitHub Thu, 15 Feb 2024 11:17:19 -0800


paleolimbot commented on code in PR #378:
URL: https://github.com/apache/arrow-nanoarrow/pull/378#discussion_r1491503688



##########
python/src/nanoarrow/c_lib.py:
##########
@@ -120,15 +157,134 @@ def c_array(obj=None, requested_schema=None) -> CArray:
             *obj.__arrow_c_array__(requested_schema=requested_schema_capsule)
         )
 
-    # for pyarrow < 14.0
-    if hasattr(obj, "_export_to_c"):
+    # Try buffer protocol (e.g., numpy arrays or a c_buffer())
+    if _obj_is_buffer(obj):
+        return _c_array_from_pybuffer(obj)
+
+    # Try import of bare capsule
+    if _obj_is_capsule(obj, "arrow_array"):
+        if requested_schema is None:
+            requested_schema_capsule = CSchema.allocate()._capsule
+        else:
+            requested_schema_capsule = requested_schema.__arrow_c_schema__()
+
+        return CArray._import_from_c_capsule(requested_schema_capsule, obj)
+
+    # Try _export_to_c for Array/RecordBatch objects if pyarrow < 14.0
+    if _obj_is_pyarrow_array(obj):
         out = CArray.allocate(CSchema.allocate())
         obj._export_to_c(out._addr(), out.schema._addr())
         return out
-    else:
-        raise TypeError(
-            f"Can't convert object of type {type(obj).__name__} to 
nanoarrow.c_array"
-        )
+
+    # Try import of iterable
+    if _obj_is_iterable(obj):
+        return _c_array_from_iterable(obj, requested_schema)
+
+    raise TypeError(
+        f"Can't convert object of type {type(obj).__name__} to 
nanoarrow.c_array"
+    )
+
+
+def c_array_from_buffers(
+    schema,
+    length: int,
+    buffers: Iterable[Any],
+    null_count: int = -1,
+    offset: int = 0,
+    children: Iterable[Any] = (),
+    validation_level: Literal["full", "default", "minimal", "none"] = 
"default",
+) -> CArray:
+    """Create an ArrowArray wrapper from components
+
+    Given a schema, build an ArrowArray buffer-wise. This allows almost any 
array
+    to be assembled; however, requires some knowledge of the Arrow Columnar
+    specification. This function will do its best to validate the sizes and
+    content of buffers according to ``validation_level``, which can be set
+    to ``"full""`` for maximum safety.
+
+    Parameters
+    ----------
+
+    schema : schema-like
+        The data type of the desired array as sanitized by :func:`c_schema`.
+    length : int
+        The length of the output array.
+    buffers : Iterable of buffer-like or None
+        An iterable of buffers as sanitized by :func:`c_buffer`. Any object
+        supporting the Python Buffer protocol is accepted. Buffer data types
+        are not checked. A buffer value of ``None`` will skip setting a buffer
+        (i.e., that buffer will be of length zero and its pointer will
+        be ``NULL``).
+    null_count : int, optional
+        The number of null values, if known in advance. If -1 (the default),
+        the null count will be calculated based on the validity bitmap. If
+        the validity bitmap was set to ``None``, the calculated null count
+        will be zero.
+    offset : int, optional
+        The logical offset from the start of the array.
+    children : Iterable of array-like
+        An iterable of arrays used to set child fields of the array. Can 
contain
+        any object accepted by :func:`c_array`. Must contain the exact number 
of
+        required children as specifed by ``schema``.
+    validation_level: str, optional
+        One of "none" (no check), "minimal" (check buffer sizes that do not 
require
+        dereferencing buffer content), "default" (check all buffer sizes), or 
"full"
+        (check all buffer sizes and all buffer content).
+
+    Examples
+    --------
+
+    >>> import nanoarrow as na
+    >>> c_array = na.c_array_from_buffers(na.uint8(), 5, [None, b"12345"])
+    >>> na.c_array_view(c_array)
+    <nanoarrow.c_lib.CArrayView>
+    - storage_type: 'uint8'
+    - length: 5
+    - offset: 0
+    - null_count: 0
+    - buffers[2]:
+      - validity <bool[0 b] >
+      - data <uint8[5 b] 49 50 51 52 53>
+    - dictionary: NULL
+    - children[0]:
+    """
+    schema = c_schema(schema)
+    builder = CArrayBuilder.allocate()
+
+    # This is slightly wasteful: it will allocate arrays recursively and we 
are about
+    # to immediately release them and replace them with another value. We 
could also
+    # create an ArrowArrayView from the buffers, which would make it more
+    # straightforward to check the buffer types and avoid the extra structure
+    # allocation.
+    builder.init_from_schema(schema)
+
+    # Set buffers. This moves ownership of the buffers as well (i.e., the 
objects
+    # in the input buffers are replaced with an empty ArrowBuffer)

Review Comment:
   I made the `ArrowBufferMove()` explicit (and made the "move + invalidate 
previous") behaviour opt-in, since it's definitely confusing if you did not 
expect it to happen).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(python): Add array creation/building from buffers [arrow-nanoarrow]

Reply via email to