[ 
https://issues.apache.org/jira/browse/ARROW-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963184#comment-16963184
 ] 

Joris Van den Bossche commented on ARROW-7031:
----------------------------------------------

While looking at this, I bumped into the following question: should the offsets 
take into account the offset of the array's data (eg in case of a sliced 
array)? 
Intuitively, I would say yes, as you want the offsets into the actual array you 
have at hand. But, also the {{.values}} does not take into account the offset:

{code}
In [6]: arr = pa.ListArray.from_arrays(offsets=[0, 3, 5], values=[1, 2, 3, 4, 
5])                                                                             
                                                     

In [7]: arr                                                                     
                                                                                
                                                   
Out[7]: 
<pyarrow.lib.ListArray object at 0x7fbaa5440f48>
[
  [
    1,
    2,
    3
  ],
  [
    4,
    5
  ]
]

In [8]: arr.values                                                              
                                                                                
                                                   
Out[8]: 
<pyarrow.lib.Int64Array object at 0x7fbaa5440d68>
[
  1,
  2,
  3,
  4,
  5
]

In [9]: arr[1:]                                                                 
                                                                                
                                                   
Out[9]: 
<pyarrow.lib.ListArray object at 0x7fbaa43d7048>
[
  [
    4,
    5
  ]
]

In [10]: arr[1:].values                                                         
                                                                                
                                                   
Out[10]: 
<pyarrow.lib.Int64Array object at 0x7fbaa43d7108>
[
  1,
  2,
  3,
  4,
  5
]
{code}

Although the `.values` is just exposing the underlying buffer as an Array (and 
slicing in C++ is just changing the offset, not changing the data), in a Python 
interface I would maybe rather expect to get also a sliced version of the 
{{.values}}

> [Python] Expose the offsets of a ListArray in python
> ----------------------------------------------------
>
>                 Key: ARROW-7031
>                 URL: https://issues.apache.org/jira/browse/ARROW-7031
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> Assume the following ListArray:
> {code}
> In [1]: arr = pa.ListArray.from_arrays(offsets=[0, 3, 5], values=[1, 2, 3, 4, 
> 5])                                                                           
>                                                        
> In [2]: arr                                                                   
>                                                                               
>                                                        
> Out[2]: 
> <pyarrow.lib.ListArray object at 0x7f11de71c708>
> [
>   [
>     1,
>     2,
>     3
>   ],
>   [
>     4,
>     5
>   ]
> ]
> {code}
> You can get the actual values as a flat array through {{.values}} / 
> {{.flatten()}}, but there is currently no easy way to get back to the offsets 
> (except from interpreting the buffers manually). 
> We should probably add an {{offsets}} attribute (there is actually also a 
> TODO comment for that).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to