raulcd commented on issue #44731:
URL: https://github.com/apache/arrow/issues/44731#issuecomment-2476526800
This is the current behavior on main too. `slice` computes a zero copy slice
of the array by updating length and/or offset where necessary:
```
>> import pyarrow as pa
>> import nanoarrow as na
>>> original_array = pa.array([{'a': 0}, {'a': 1}, {'a': 2}])
>>> sliced_array = original_array.slice(0,1)
>>> sliced_array
<pyarrow.lib.StructArray object at 0x764b3cf5d2a0>
-- is_valid: all not null
-- child 0 type: int64
[
0
]
>>> na.array(sliced_array).inspect()
<ArrowArray struct<a: int64>>
- length: 1
- offset: 0
- null_count: 0
- buffers[1]:
- validity <bool[0 b] >
- dictionary: NULL
- children[1]:
'a': <ArrowArray int64>
- length: 3
- offset: 0
- null_count: 0
- buffers[2]:
- validity <bool[0 b] >
- data <int64[24 b] 0 1 2>
- dictionary: NULL
- children[0]:
>>> na.array(original_array).inspect()
<ArrowArray struct<a: int64>>
- length: 3
- offset: 0
- null_count: 0
- buffers[1]:
- validity <bool[0 b] >
- dictionary: NULL
- children[1]:
'a': <ArrowArray int64>
- length: 3
- offset: 0
- null_count: 0
- buffers[2]:
- validity <bool[0 b] >
- data <int64[24 b] 0 1 2>
- dictionary: NULL
- children[0]:
```
I understand the use case but I am unsure what should be the behavior in
order to generate the `RecordBatch` if we have updated the offset with the
slice as an example:
```
>>> pa.table(pa.array([{'a': 0}, {'a': 1}, {'a': 2}]).slice(1,2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/table.pxi", line 6172, in pyarrow.lib.table
batch = record_batch(data, schema)
File "pyarrow/table.pxi", line 5991, in pyarrow.lib.record_batch
batch = RecordBatch._import_from_c_device_capsule(schema_capsule,
array_capsule)
File "pyarrow/table.pxi", line 4002, in
pyarrow.lib.RecordBatch._import_from_c_device_capsule
batch = GetResultValue(ImportDeviceRecordBatch(c_array, c_schema))
File "pyarrow/error.pxi", line 155, in
pyarrow.lib.pyarrow_internal_check_status
return check_status(status)
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
raise convert_status(status)
pyarrow.lib.ArrowInvalid: ArrowArray struct has non-zero offset, cannot be
imported as RecordBatch
>>> pa.array([{'a': 0}, {'a': 1}, {'a': 2}]).slice(1,2)
<pyarrow.lib.StructArray object at 0x764b3cf5d8a0>
-- is_valid: all not null
-- child 0 type: int64
[
1,
2
]
>>>
```
@jorisvandenbossche @pitrou
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]