paleolimbot commented on code in PR #464:
URL: https://github.com/apache/arrow-nanoarrow/pull/464#discussion_r1600375320
##########
python/src/nanoarrow/visitor.py:
##########
@@ -74,9 +75,60 @@ def to_columns(obj, schema=None) -> Tuple[List[str],
List[Sequence]]:
>>> names
['col1']
>>> columns
- [[1, 2, 3]]
+ [nanoarrow.c_lib.CBuffer(int64[24 b] 1 2 3)]
"""
- return ColumnsBuilder.visit(obj, schema)
+ return ColumnsBuilder.visit(obj, schema, handle_nulls=handle_nulls)
+
+
+def nulls_forbid() -> Callable[[CBuffer, Sequence], Sequence]:
+ def handle(is_valid, data):
+ if len(is_valid) > 0:
+ raise ValueError("Null present with null_handler=nulls_forbid()")
+
+ return data
+
+ return handle
+
+
+def nulls_debug() -> Callable[[CBuffer, Sequence], Tuple[CBuffer, Sequence]]:
+ def handle(is_valid, data):
+ return is_valid, data
+
+ return handle
+
+
+def nulls_as_sentinel(sentinel=None):
+ from numpy import array, result_type
+
+ def handle(is_valid, data):
+ is_valid = array(is_valid, copy=False)
+ data = array(data, copy=False)
Review Comment:
> is that already unpacked at this point?
Yes, all bitmaps are unpacked before they get here. We could also export a
bitmap but that is supported basically nowhere (neither pandas nor numpy seem
to support it to mark nulls/missing data or to do some kind of subset-assign to
otherwise mark them).
One of the things that's not implemented in buffer implementation exported
here is the ability to export a typed writable buffer. Technically they all
could be written to (we just allocated all of them) which might make some of
these or downstream modifications faster. On the flip side, we could try harder
to *not* copy things for the case where we only ever saw one big array.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]