jorisvandenbossche commented on issue #34634: URL: https://github.com/apache/arrow/issues/34634#issuecomment-1477517937
> According [to documentation](https://arrow.apache.org/docs/python/generated/pyarrow.compute.replace_with_mask.html) the following should be fulfilled: `len(replacements) == sum(mask == true)`. > In this case, `len(replacements) == 2` and `sum(mask==True) == 1`. Indeed, the usage in the top-post reproducer is actually wrong, but generally we currently ignore if the `replacements` array is too long (see also https://github.com/apache/arrow/issues/32436). And indeed, also if you provide the correct length, the same issue occurs: ``` In [12]: import pyarrow as pa ...: import pyarrow.compute as pc ...: ...: arr = pa.chunked_array([[True, True]]) ...: mask = pa.array([False, True]) ...: replacements = pa.array([False]) In [13]: pc.replace_with_mask(arr, mask, replacements) Out[13]: <pyarrow.lib.ChunkedArray object at 0x7f027c727650> [ <Invalid array: Buffer #1 too small in array of type bool and length 2: expected at least 1 byte(s), got 0 ] ``` The support for chunked arrays in this kernel is generally limited (see https://github.com/apache/arrow/issues/31665), but providing a chunked array for the _input_ array should now work. The above example also works if I use a different type (eg int64) instead of boolean. So this seems to be an issue specifically with boolean input array. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
