[PR] feat(python): Add column-wise buffer builder [arrow-nanoarrow]

via GitHub Mon, 13 May 2024 12:53:01 -0700


paleolimbot opened a new pull request, #464:
URL: https://github.com/apache/arrow-nanoarrow/pull/464


   This PR implements building columns buffer-wise for the types where this 
makes sense. (Still working out the details of how to inject null handling 
here).
   
   ```python
   import nanoarrow as na
   from nanoarrow import visitor
   import pyarrow as pa
   
   batch = pa.record_batch({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
   batch_with_nulls = pa.record_batch({"col1": [1, None, 3], "col2": ["a", "b", 
None]})
   
   # Either builds a buffer or a list depending on column types
   visitor.to_columns(batch)
   #> (['col1', 'col2'],
   #>  [nanoarrow.c_lib.CBuffer(int64[24 b] 1 2 3), ['a', 'b', 'c']])
   
   # One can inject a null handler (a few experimental ones provided)
   visitor.to_columns(batch_with_nulls, 
handle_nulls=visitor.nulls_as_masked_array())
   #> (['col1', 'col2'],
   #>  [masked_array(data=[1, --, 3],
   #>                mask=[False,  True, False],
   #>          fill_value=999999,
   #>               dtype=int64),
   #>   ['a', 'b', None]])
   
   visitor.to_columns(batch_with_nulls, 
handle_nulls=visitor.nulls_as_sentinel())
   #> (['col1', 'col2'], [array([ 1., nan,  3.]), ['a', 'b', None]])
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(python): Add column-wise buffer builder [arrow-nanoarrow]

Reply via email to