amol- commented on a change in pull request #10199:
URL: https://github.com/apache/arrow/pull/10199#discussion_r632411238



##########
File path: python/pyarrow/tests/test_array.py
##########
@@ -2666,6 +2666,38 @@ def test_array_masked():
     assert arr.type == pa.int64()
 
 
+def test_binary_array_masked():
+    # ARROW-12431
+    masked_basic = pa.array([b'\x05'], type=pa.binary(1),
+                            mask=np.array([False]))
+    assert [b'\x05'] == masked_basic.to_pylist()
+
+    # Fixed Length Binary
+    masked = pa.array(np.array([b'\x05']), type=pa.binary(1),
+                      mask=np.array([False]))
+    assert [b'\x05'] == masked.to_pylist()
+
+    masked_nulls = pa.array(np.array([b'\x05']), type=pa.binary(1),
+                            mask=np.array([True]))
+    assert [None] == masked_nulls.to_pylist()
+
+    # Variable Length Binary
+    masked = pa.array(np.array([b'\x05']), type=pa.binary(),
+                      mask=np.array([False]))
+    assert [b'\x05'] == masked.to_pylist()
+
+    masked_nulls = pa.array(np.array([b'\x05']), type=pa.binary(),
+                            mask=np.array([True]))
+    assert [None] == masked_nulls.to_pylist()
+
+    # Fixed Length Binary, copy

Review comment:
       It verifies that the behaviour is the same that we get from variable 
length binary arrays, which do not reuse the numpy array memory. I don't think 
it's an implementation detail because it changes the user experience.
   
   The fact that the underlying numpy array is shared or not changes the user 
experience as it means the user can't modify the original numpy array without 
indirectly modifying (probably unexpectedly) the Arrow array too.
   
   which lead me to create https://issues.apache.org/jira/browse/ARROW-12666 
because in some cases we reuse the numpy memory (all basic types) and in other 
cases we don't (the string, binary etc... types). The follow up ticket suggests 
to make that behaviour clear as numpy does by adding a `copy=True/False` 
argument to the `pyarrow.array`  function. 
   
   We can discuss further what's the best way to go in that dedicated ticket, 
here I wanted to make sure we were consistent with that happens when 
`pa.binary()` and `pa.binary(N)` are used. So the test verifies that if you 
modify the numpy array the arraw array doesn't change.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to