JacekPliszka opened a new issue, #38640: URL: https://github.com/apache/arrow/issues/38640
### Describe the enhancement requested 1. Using int should not be that much slower than using np.uint8 2. numpy is fastest for b"s" which fails for pyarrow In [21]: val = np.uint8(115) In [22]: %timeit np.count_nonzero(data_np == val) 591 µs ± 3.56 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [23]: %timeit np.count_nonzero(data_np == 115) 598 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [24]: %timeit np.count_nonzero(data_np == b"s") 403 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [25]: %timeit pc.equal(data_pa, val).sum().as_py() 1.64 ms ± 8.23 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [26]: %timeit pc.equal(data_pa, 115).sum().as_py() 15.6 ms ± 21.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [27]: %timeit pc.equal(data_pa, b"s").sum().as_py() ArrowNotImplementedError: Function 'equal' has no kernel matching input types (uint8, binary) ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
