[
https://issues.apache.org/jira/browse/ARROW-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963184#comment-16963184
]
Joris Van den Bossche commented on ARROW-7031:
----------------------------------------------
While looking at this, I bumped into the following question: should the offsets
take into account the offset of the array's data (eg in case of a sliced
array)?
Intuitively, I would say yes, as you want the offsets into the actual array you
have at hand. But, also the {{.values}} does not take into account the offset:
{code}
In [6]: arr = pa.ListArray.from_arrays(offsets=[0, 3, 5], values=[1, 2, 3, 4,
5])
In [7]: arr
Out[7]:
<pyarrow.lib.ListArray object at 0x7fbaa5440f48>
[
[
1,
2,
3
],
[
4,
5
]
]
In [8]: arr.values
Out[8]:
<pyarrow.lib.Int64Array object at 0x7fbaa5440d68>
[
1,
2,
3,
4,
5
]
In [9]: arr[1:]
Out[9]:
<pyarrow.lib.ListArray object at 0x7fbaa43d7048>
[
[
4,
5
]
]
In [10]: arr[1:].values
Out[10]:
<pyarrow.lib.Int64Array object at 0x7fbaa43d7108>
[
1,
2,
3,
4,
5
]
{code}
Although the `.values` is just exposing the underlying buffer as an Array (and
slicing in C++ is just changing the offset, not changing the data), in a Python
interface I would maybe rather expect to get also a sliced version of the
{{.values}}
> [Python] Expose the offsets of a ListArray in python
> ----------------------------------------------------
>
> Key: ARROW-7031
> URL: https://issues.apache.org/jira/browse/ARROW-7031
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
>
> Assume the following ListArray:
> {code}
> In [1]: arr = pa.ListArray.from_arrays(offsets=[0, 3, 5], values=[1, 2, 3, 4,
> 5])
>
> In [2]: arr
>
>
> Out[2]:
> <pyarrow.lib.ListArray object at 0x7f11de71c708>
> [
> [
> 1,
> 2,
> 3
> ],
> [
> 4,
> 5
> ]
> ]
> {code}
> You can get the actual values as a flat array through {{.values}} /
> {{.flatten()}}, but there is currently no easy way to get back to the offsets
> (except from interpreting the buffers manually).
> We should probably add an {{offsets}} attribute (there is actually also a
> TODO comment for that).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)