[ 
https://issues.apache.org/jira/browse/ARROW-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-13632:
-----------------------------------
    Fix Version/s: 5.0.1
                   6.0.0

> [Python] Filter mask is always applied to elements at the begging of 
> FixedSizeListArray when filtering a slice
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13632
>                 URL: https://issues.apache.org/jira/browse/ARROW-13632
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 5.0.0
>         Environment: Windows 10, Python 3.9
>            Reporter: Vadym Zhernovyi
>            Priority: Major
>             Fix For: 6.0.0, 5.0.1
>
>
> When calling FixedSizeListArray.filter for a slice, it is always applied to 
> the first (len(slice)) elements at the begging of the array which a slice is 
> created from.
> * The issue doesn't reproduce for ListArray. 
> * a particular mask doesn't matter
> * slice length and position doesn't matter
> * a number of elements filtered at wrong position is always equal to a length 
> of a slice
> * the issues is not reproduced with 
> [ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html]
> * a type of data (int32, float, ...) doesn't matter
> {code:python}
> Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) 
> [MSC v.1916 64 bit (AMD64)] on win32
> >>> import numpy as np
> >>> import pyarrow as pa
> >>> np.__version__
> '1.21.1'
> >>> pa.__version__
> '5.0.0'
> >>> data = [
>     np.zeros(3, dtype='int32'),
>     np.ones(3, dtype='int32'),
>     np.ones(3, dtype='int32') + 1,
>     np.ones(3, dtype='int32') + 2,
>     np.ones(3, dtype='int32') + 3,
>     np.ones(3, dtype='int32') + 4,
>     np.ones(3, dtype='int32') + 5,
>     np.ones(3, dtype='int32') + 6
>       ]
> >>> a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # 
> >>> FixedSizeListArray
> >>> a.filter(pa.array(len(a) * [True]))  # everything is ok 
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
> [
>   [0, 0, 0],
>   [1, 1, 1],
>   [2, 2, 2],
>   [3, 3, 3],
>   [4, 4, 4],
>   [5, 5, 5],
>   [6, 6, 6],
>   [7, 7, 7]
> ]
> >>> a[3:7].filter(pa.array(4 * [True]))  # output is filtered elements of 
> >>> a[0:3] instead of a[3:7]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
> [
>   [0, 0, 0],
>   [1, 1, 1],
>   [2, 2, 2],
>   [3, 3, 3]
> ]
> >>> a[3:7].filter(pa.array([True, False, True, False]))  # output is filtered 
> >>> elements of a[0:3] instead of a[3:7]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
> [
>   [0, 0, 0],
>   [2, 2, 2]
> ]
> >>> a[4:].filter(pa.array([True, True, True, True]))  # output is filtered 
> >>> elements of a[0:3] instead of a[4:]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
> [
>   [0, 0, 0],
>   [1, 1, 1],
>   [2, 2, 2],
>   [3, 3, 3]
> ]
> >>> a[4:6].filter(pa.array([True, True]))  # output is filtered elements of 
> >>> a[0:2] instead of a[4:6]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
> [
>   [0, 0, 0],
>   [1, 1, 1]
> ]
> >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * 
> >>> [True]))  # ListArray slice filtering works ok
> <pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
> [
>   [3, 3, 3],
>   [4, 4, 4],
>   [5, 5, 5],
>   [6, 6, 6]
> ]
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to