[
https://issues.apache.org/jira/browse/ARROW-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vadym Zhernovyi updated ARROW-13632:
------------------------------------
Description:
When calling FixedSizeListArray.filter for a slice, it is always applied to the
first (len(slice)) elements at the begging of the array which a slice is
created from.
* The issue doesn't reproduce for ListArray.
* a particular mask doesn't matter
* slice length and position doesn't matter
* a number of elements filtered at wrong position is always equal to a length
of a slice
* the issues is not reproduced with
[ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html]
* a type of data (int32, float, ...) doesn't matter
{code:python}
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC
v.1916 64 bit (AMD64)] on win32
>>> import numpy as np
>>> import pyarrow as pa
>>> np.__version__
'1.21.1'
>>> pa.__version__
'5.0.0'
>>> data = [
np.zeros(3, dtype='int32'),
np.ones(3, dtype='int32'),
np.ones(3, dtype='int32') + 1,
np.ones(3, dtype='int32') + 2,
np.ones(3, dtype='int32') + 3,
np.ones(3, dtype='int32') + 4,
np.ones(3, dtype='int32') + 5,
np.ones(3, dtype='int32') + 6
]
a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray
>>> a.filter(pa.array(len(a) * [True])) # everything is ok
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
[
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]
]
>>> a[3:7].filter(pa.array(4 * [True])) # outputs filtered element of a[0:3]
>>> instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
[
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3]
]
>>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered
>>> element of a[0:3] instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
[
[0, 0, 0],
[2, 2, 2]
]
>>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered element
>>> of a[0:3] instead of a[4:]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
[
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3]
]
>>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2]
>>> instead of a[4:6]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
[
[0, 0, 0],
[1, 1, 1]
]
>>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True]))
>>> # ListArray slice filtering works ok
<pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
[
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6]
]
{code}
was:
When calling FixedSizeListArray.filter for a slice, it is always applied to the
first (len(slice)) elements at the begging of the array which a slice is
created from.
* The issue doesn't reproduce for ListArray.
* a particular mask doesn't matter
* slice length and position doesn't matter
* a number of elements filtered at wrong position is always equal to a length
of a slice
* the issues is not reproduced with
[ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html]
* a type of data (int32, float, ...) doesn't matter
{code:python}
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC
v.1916 64 bit (AMD64)] on win32
>>> import numpy as np
>>> import pyarrow as pa
>>> np.__version__
'1.21.1'
>>> pa.__version__
'5.0.0'
>>> data = [
np.zeros(3, dtype='int32'),
np.ones(3, dtype='int32'),
np.ones(3, dtype='int32') + 1,
np.ones(3, dtype='int32') + 2,
np.ones(3, dtype='int32') + 3,
np.ones(3, dtype='int32') + 4,
np.ones(3, dtype='int32') + 5,
np.ones(3, dtype='int32') + 6
]
a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray
>>> a.filter(pa.array(len(a) * [True])) # everything is ok
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
[
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]
]
>>> a[3:7].filter(pa.array(4 * [True])) # outputs filtered element of a[0:3]
>>> instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
[
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3]
]
>>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered
>>> element of a[0:3] instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
[
[0, 0, 0],
[2, 2, 2]
]
>>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered element
>>> of a[0:3] instead of a[4:]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
[
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3]
]
>>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2]
>>> instead of a[4:6]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
[
[0, 0, 0],
[1, 1, 1]
]
>>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True]))
>>> # ListArray slice filtering works ok
<pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
[
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6]
]
{code}
> [Python] Filter mask is always applied to elements at the begging of
> FixedSizeListArray when filtering a slice
> --------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-13632
> URL: https://issues.apache.org/jira/browse/ARROW-13632
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 5.0.0
> Environment: Windows 10, Python 3.9
> Reporter: Vadym Zhernovyi
> Priority: Major
>
> When calling FixedSizeListArray.filter for a slice, it is always applied to
> the first (len(slice)) elements at the begging of the array which a slice is
> created from.
> * The issue doesn't reproduce for ListArray.
> * a particular mask doesn't matter
> * slice length and position doesn't matter
> * a number of elements filtered at wrong position is always equal to a length
> of a slice
> * the issues is not reproduced with
> [ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html]
> * a type of data (int32, float, ...) doesn't matter
> {code:python}
> Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25)
> [MSC v.1916 64 bit (AMD64)] on win32
> >>> import numpy as np
> >>> import pyarrow as pa
> >>> np.__version__
> '1.21.1'
> >>> pa.__version__
> '5.0.0'
> >>> data = [
> np.zeros(3, dtype='int32'),
> np.ones(3, dtype='int32'),
> np.ones(3, dtype='int32') + 1,
> np.ones(3, dtype='int32') + 2,
> np.ones(3, dtype='int32') + 3,
> np.ones(3, dtype='int32') + 4,
> np.ones(3, dtype='int32') + 5,
> np.ones(3, dtype='int32') + 6
> ]
> a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) #
> FixedSizeListArray
> >>> a.filter(pa.array(len(a) * [True])) # everything is ok
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
> [
> [0, 0, 0],
> [1, 1, 1],
> [2, 2, 2],
> [3, 3, 3],
> [4, 4, 4],
> [5, 5, 5],
> [6, 6, 6],
> [7, 7, 7]
> ]
> >>> a[3:7].filter(pa.array(4 * [True])) # outputs filtered element of a[0:3]
> >>> instead of a[3:7]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
> [
> [0, 0, 0],
> [1, 1, 1],
> [2, 2, 2],
> [3, 3, 3]
> ]
> >>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered
> >>> element of a[0:3] instead of a[3:7]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
> [
> [0, 0, 0],
> [2, 2, 2]
> ]
> >>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered
> >>> element of a[0:3] instead of a[4:]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
> [
> [0, 0, 0],
> [1, 1, 1],
> [2, 2, 2],
> [3, 3, 3]
> ]
> >>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2]
> >>> instead of a[4:6]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
> [
> [0, 0, 0],
> [1, 1, 1]
> ]
> >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 *
> >>> [True])) # ListArray slice filtering works ok
> <pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
> [
> [3, 3, 3],
> [4, 4, 4],
> [5, 5, 5],
> [6, 6, 6]
> ]
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)