Re: [I] PyArrow Capsule from Nanoarrow-built Interval Arrow Yields Unexpected Values [arrow]

via GitHub Sat, 27 Jan 2024 01:45:56 -0800


jorisvandenbossche commented on issue #39816:
URL: https://github.com/apache/arrow/issues/39816#issuecomment-1913097288


   I quickly tested your MRE vs pyarrow using nanoarrow-python to inspect the 
data:
   
   PyArrow:
   
   ```
   import pyarrow as pa
   schema = pa.schema([("interval", pa.month_day_nano_interval())])
   tbl = pa.Table.from_arrays([pa.array(
       [
           None,
           pa.scalar((1, 1, 1), type=pa.month_day_nano_interval()),
           pa.scalar((42, 42, 42), type=pa.month_day_nano_interval()),
           None,
       ]
   )], schema=schema)
   
   In [5]: stream = na.c_array_stream(tbl)
   
   In [6]: arr = s.get_next().child(0)
   
   In [7]: arr
   Out[7]: 
   <nanoarrow.c_lib.CArray interval_month_day_nano>
   - length: 4
   - offset: 0
   - null_count: 2
   - buffers: (140484108394496, 140484108394560)
   - dictionary: NULL
   - children[0]:
   
   In [8]: na.c_array_view(ar)
   Out[8]: 
   <nanoarrow.c_lib.CArrayView>
   - storage_type: 'interval_month_day_nano'
   - length: 4
   - offset: 0
   - null_count: 2
   - buffers[2]:
     - <bool validity[1 b] 01100000>
     - <interval_month_day_nano data[64 b] (0, 0, 0) (1, 1, 1) (42, 42, 42) (0, 
...>
   - dictionary: NULL
   - children[0]:
   ```
   
   Your MRE:
   
   ```
   In [1]: import nanoarrow_mre
   
   In [2]: capsule = nanoarrow_mre.get_interval_capsule()
   
   In [3]: import nanoarrow as na
   
   In [4]: stream = na.c_lib.CArrayStream._import_from_c_capsule(capsule)
   
   In [5]: stream
   Out[5]: 
   <nanoarrow.c_lib.CArrayStream>
   - get_schema(): struct<interval_column: interval_month_day_nano>
   
   In [6]: arr = stream.get_next().child(0)
   
   In [7]: arr
   Out[7]: 
   <nanoarrow.c_lib.CArray interval_month_day_nano>
   - length: 4
   - offset: 0
   - null_count: 2
   - buffers: (94736573435584, 94736573573504)
   - dictionary: NULL
   - children[0]:
   
   In [8]: na.c_array_view(arr)
   Out[8]: 
   <nanoarrow.c_lib.CArrayView>
   - storage_type: 'interval_month_day_nano'
   - length: 4
   - offset: 0
   - null_count: 2
   - buffers[2]:
     - <bool validity[1 b] 00111111>
     - <interval_month_day_nano data[64 b] (0, 0, 0) (1, 1, 1) (42, 42, 42) (0, 
...>
   - dictionary: NULL
   - children[0]:
   ```
   
   So the data itself looks good (the (1, 1, 1) and (42, 42, 42) are still 
there), but it's the validity bitmap that is wrong. It masks the (1,1,1) value, 
and does not mask the 4th value, making this (0, 0, 0) visible.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] PyArrow Capsule from Nanoarrow-built Interval Arrow Yields Unexpected Values [arrow]

Reply via email to