amol- commented on pull request #11185:
URL: https://github.com/apache/arrow/pull/11185#issuecomment-924960889


   > > At that point it would probably make sense to just raise a "mask must be 
a numpy array when data is a numpy array" error and make the user explicitly 
deal with the overhead
   > 
   > That would be fine as well I think, yes (although we generally accept 
array-likes in most places, so doing an `if mask is not None: mask = 
asarray(mask)` might be slightly more consistent)
   
   I went a bit down that path, but instead of forcing the mask to be a 
`numpy.array` as it was before, I forced it to be a `pyarrow.array`. Before 
sending the mask down to C++ I `asarray` gets invoked to guarantee that C++ 
always receives a `BooleanArray` as a mask.
   
   That seemed a more suitable strategy because converting from `numpy.array` 
to `BooleanArray` should usually be zero copy and thus have little overhead as 
far as I can understand and it seems more reasonable that the codebase uses its 
own implementation of array for masks instead of relying on a third party 
library.
   
   For the end user not much has changed, he/she can provide anything (lists, 
numpy.array, pyarrow.array) as masks and they are all supported but the code 
complexity is in the end greatly reduced by only having to support 
`pyarrow.array`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to