amol- commented on a change in pull request #10199:
URL: https://github.com/apache/arrow/pull/10199#discussion_r627241356



##########
File path: cpp/src/arrow/python/numpy_to_arrow.cc
##########
@@ -594,7 +594,13 @@ Status NumPyConverter::Visit(const FixedSizeBinaryType& 
type) {
 
   if (mask_ != nullptr) {
     Ndarray1DIndexer<uint8_t> mask_values(mask_);
-    RETURN_NOT_OK(builder.AppendValues(data, length_, mask_values.data()));
+    std::unique_ptr<uint8_t[]> inverted_mask(new uint8_t[length_]);
+    for (int64_t i = 0; i < length_; ++i) {
+      inverted_mask[i] = !mask_values[i];
+    }

Review comment:
       Confirmed that this new implementation is fully sharing memory with 
numpy. 
   
   ```
           npa = np.array([b'aaa', b'bbb', b'ccc']*100)
           arrow_array = pa.array(npa, type=pa.binary(3),
                                  mask=np.array([False, False, False]*100))
           npa[npa == b"bbb"] = b"XXX"
   >       assert ([b'aaa', b'bbb', b'ccc']*100) == arrow_array.to_pylist()
   E       AssertionError: assert [b'aaa', b'bb..., b'ccc', ...] == [b'aaa', 
b'XX..., b'ccc', ...]
   E         At index 1 diff: b'bbb' != b'XXX'
   ```
   
   Not sure that's what we want to do. It seems that when doing 
`pa.array(np.array)` it's currently pretty random if we end up sharing memory 
or not. 
   
   For example 
   ```
   npa = np.array([1, 2, 3]*3)
   arrow_array = pa.array(npa, type=pa.int64())
   ```
   shares memory, while
   ``` 
   npa = np.array([1, 2, 3]*3)
   arrow_array = pa.array(npa, type=pa.int32())
   ``` 
   doesn't.
   
   That should probably be consolidated.
   
   Thinking about it, I don't think we should be sharing memory by default when 
building a new array, as the user expectation is probably that they are 
building something new.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to