[ 
https://issues.apache.org/jira/browse/ARROW-12431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324893#comment-17324893
 ] 

Joris Van den Bossche commented on ARROW-12431:
-----------------------------------------------

[~nugend] thanks for the report!

It seems to specifically happen when the input array has numpy's binary/string 
dtype (and not when it's object type):

{code}
In [27]: pa.array(np.array([b'\x00']),type=pa.binary(1), mask = 
np.array([True]))
Out[27]: 
<pyarrow.lib.FixedSizeBinaryArray object at 0x7f6d65b32640>
[
  00
]

In [28]: pa.array(np.array([b'\x00'], dtype=object),type=pa.binary(1), mask = 
np.array([True]))
Out[28]: 
<pyarrow.lib.FixedSizeBinaryArray object at 0x7f6d65b32f40>
[
  null
{code}

(I assume the object dtype array takes a similar path as the list input)

> [Python] pa.array mask inverted when type is binary and value to be converted 
> is numpy array
> --------------------------------------------------------------------------------------------
>
>                 Key: ARROW-12431
>                 URL: https://issues.apache.org/jira/browse/ARROW-12431
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Daniel Nugent
>            Priority: Major
>
> {code:python}
> Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)     
>                               
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy as np
> >>> import pyarrow as pa
> >>>
> >>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([False]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa080ca3640>
> [
>   null
> ]
> >>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([True]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa080ca3700>
> [
>   00
> ]
> >>> pa.array([b'\x00'],type=pa.binary(1), mask = np.array([False]))
> <pyarrow.lib.FixedSizeBinaryArray object at 0x7fa083cc9520>
> [
>   00
> ]
> >>> pa.__version__
> '3.0.0'
> >>> np.__version__
> '1.20.1'
> {code}
> Happens both with FixedSizeBinary and variable sized binary (I was working 
> with FixedSizeBinary). Does not happen for integers (presumably other types, 
> didn't exhaustively check)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to