jorisvandenbossche commented on issue #34634:
URL: https://github.com/apache/arrow/issues/34634#issuecomment-1477517937

   > According [to 
documentation](https://arrow.apache.org/docs/python/generated/pyarrow.compute.replace_with_mask.html)
 the following should be fulfilled: `len(replacements) == sum(mask == true)`.
   > In this case, `len(replacements) == 2` and `sum(mask==True) == 1`.
   
   Indeed, the usage in the top-post reproducer is actually wrong, but 
generally we currently ignore if the `replacements` array is too long (see also 
https://github.com/apache/arrow/issues/32436). And indeed, also if you provide 
the correct length, the same issue occurs:
   
   ```
   In [12]: import pyarrow as pa
       ...: import pyarrow.compute as pc
       ...: 
       ...: arr = pa.chunked_array([[True, True]])
       ...: mask = pa.array([False, True])
       ...: replacements = pa.array([False])
   
   In [13]: pc.replace_with_mask(arr, mask, replacements)
   Out[13]: 
   <pyarrow.lib.ChunkedArray object at 0x7f027c727650>
   [
   <Invalid array: Buffer #1 too small in array of type bool and length 2: 
expected at least 1 byte(s), got 0
   ]
   ```
   
   The support for chunked arrays in this kernel is generally limited (see 
https://github.com/apache/arrow/issues/31665), but providing a chunked array 
for the _input_ array should now work. The above example also works if I use a 
different type (eg int64) instead of boolean. So this seems to be an issue 
specifically with boolean input array.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to