[
https://issues.apache.org/jira/browse/ARROW-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568983#comment-17568983
]
Joris Van den Bossche commented on ARROW-17134:
-----------------------------------------------
The replacement array isn't expected to be of the same shape as the input/mask
arrays (where the corresponding values would get replaced), but it's only the
values that are actually placed in the new array (so len(replacements) ==
number of true values in the mask).
So given that your {{arr2}} starts with two nulls, it are those two values that
are put in the result.
Comparing to numpy, it has thus the similar behaviour as {{setitem}}
({{arr[mask] = replacements}}), and not like {{np.putmask}} (where values and
replacements have the same shape)
We should maybe consider raising an error if the {{replacements}} are too long?
The case where you want to use the corresponding (same location) values of
values vs replacements, for that case I think one can use {{pc.if_else(mask,
replacements, values)}}. Using your example:
{code}
In [13]: pc.if_else([False, False, False, True, True], arr2, arr1)
Out[13]:
<pyarrow.lib.Int64Array object at 0x7f52f4eecd60>
[
1,
0,
1,
0,
1
]
{code}
> [C++(?)/Python] pyarrow.compute.replace_with_mask does not replace null when
> providing an array mask
> ----------------------------------------------------------------------------------------------------
>
> Key: ARROW-17134
> URL: https://issues.apache.org/jira/browse/ARROW-17134
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Affects Versions: 8.0.0
> Reporter: Matthew Roeschke
> Priority: Major
>
>
> {code:java}
> In [1]: import pyarrow as pa
> In [2]: arr1 = pa.array([1, 0, 1, None, None])
> In [3]: arr2 = pa.array([None, None, 1, 0, 1])
> In [4]: pa.compute.replace_with_mask(arr1, [False, False, False, True, True],
> arr2)
> Out[4]:
> <pyarrow.lib.Int64Array object at 0x118a3e320>
> [
> 1,
> 0,
> 1,
> null, # I would expect 0
> null # I would expect 1
> ]
> In [5]: pa.__version__
> Out[5]: '8.0.0'{code}
>
> I have noticed this behavior occur with the integer, floating, bool, temporal
> types
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)