[ https://issues.apache.org/jira/browse/ARROW-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515280#comment-17515280 ]
David Li commented on ARROW-16081: ---------------------------------- NumPy's bool [is a byte|https://numpy.org/devdocs/user/basics.types.html] while Arrow's bool [is a bit|https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout]. Converting it via an array instead of a buffer will work: {noformat} >>> import pyarrow as pa >>> import numpy as np >>> data = np.array([True, False, True, False], dtype=bool) >>> arr = pa.array(data) >>> arr.to_numpy(zero_copy_only=False) array([ True, False, True, False]) >>> data array([ True, False, True, False]) {noformat} > Incorrect results when reading a buffer of boolean values > --------------------------------------------------------- > > Key: ARROW-16081 > URL: https://issues.apache.org/jira/browse/ARROW-16081 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 7.0.0 > Environment: Ubuntu 20.04, Python 3.8.10, pyarrow==7.0.0 > Reporter: Jonathan Kenyon > Priority: Major > > The following reproducer demonstrates that a buffer of boolean values is not > correctly recovered when using pyarrow. > {code:python} > import pyarrow.parquet as pq > import pyarrow as pa > import numpy as np > if __name__ == "__main__": > data = np.array([True, False, True, False], dtype=bool) > length = len(data) > buf = pa.py_buffer(data) > array = pa.Array.from_buffers(pa.bool_(), length, [None, buf]) > np.testing.assert_array_equal(data, array.to_numpy(zero_copy_only=False)) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)