Re: [PR] GH-41662: [Python] Ensure Buffer methods don't crash with non-CPU data [arrow]

via GitHub Thu, 06 Jun 2024 09:41:30 -0700


jorisvandenbossche commented on code in PR #41889:
URL: https://github.com/apache/arrow/pull/41889#discussion_r1629849391



##########
python/pyarrow/io.pxi:
##########
@@ -1311,7 +1311,10 @@ cdef class Buffer(_Weakrefable):
         -------
         : bytes
         """
-        return self.buffer.get().ToHexString()
+        if self.is_cpu:
+            return self.buffer.get().ToHexString()
+        else:
+            raise NotImplementedError("Implemented only for data on CPU 
device")

Review Comment:
   ```suggestion
           if not self.is_cpu:
               raise NotImplementedError("Implemented only for data on CPU 
device")
           return self.buffer.get().ToHexString()            
   ```
   
   (in this case your code is also perfectly fine, but I think I have a slight 
preference over always using the same pattern of checking if not CPU. Although 
it is only two lines and we only need to do it in three places for now, it 
could also be factored out in a helper method like `self._assert_cpu()`)



##########
python/pyarrow/tests/test_io.py:
##########
@@ -669,6 +669,57 @@ def test_allocate_buffer_resizable():
     assert buf.size == 200
 
 
+def test_non_cpu_buffer(pickle_module):
+    cuda = pytest.importorskip("pyarrow.cuda")
+    ctx = cuda.Context(0)
+
+    arr = np.arange(4, dtype=np.int32)
+    cuda_buf = ctx.buffer_from_data(arr)
+
+    arr = pa.Array.from_buffers(pa.int32(), 4, [None, cuda_buf])
+    buf_on_gpu = arr.buffers()[1]
+
+    assert buf_on_gpu.size == cuda_buf.size
+    assert buf_on_gpu.address == cuda_buf.address
+    assert buf_on_gpu.is_cpu == cuda_buf.is_cpu
+
+    repr1 = "<pyarrow.Buffer address="
+    repr2 = "size=16 is_cpu=False is_mutable=True>"
+    assert repr1 in repr(buf_on_gpu)
+    assert repr2 in repr(buf_on_gpu)
+
+    msg = "Implemented only for data on CPU device"
+    with pytest.raises(NotImplementedError, match=msg):
+        buf_on_gpu.hex()
+
+    with pytest.raises(NotImplementedError, match=msg):
+        cuda_buf.hex()
+
+    assert buf_on_gpu.is_mutable
+
+    with pytest.raises(NotImplementedError, match=msg):
+        buf_on_gpu[1]
+
+    with pytest.raises(NotImplementedError, match=msg):
+        cuda_buf[1]
+
+    with pytest.raises(NotImplementedError, match=msg):
+        pickle_module.dumps(buf_on_gpu, protocol=4)
+
+    buf = pa.py_buffer(b'testing')
+    arr = pa.FixedSizeBinaryArray.from_buffers(pa.binary(7), 1, [None, buf])
+    buf_on_gpu = arr.buffers()[1]
+    buf_on_gpu_sliced = buf_on_gpu.slice(2)
+
+    buf = pa.py_buffer(b'sting')
+    arr = pa.FixedSizeBinaryArray.from_buffers(pa.binary(5), 1, [None, buf])
+    buf_on_gpu_expected = arr.buffers()[1]
+
+    assert buf_on_gpu_sliced.equals(buf_on_gpu_expected)
+    assert buf.equals(buf_on_gpu_expected)
+    assert buf_on_gpu_sliced.to_pybytes() == buf_on_gpu_expected.to_pybytes()

Review Comment:
   Could you also check `buf_on_gpu.to_pybytes()`  with a hardcoded bytes 
result? Because right now if both left and right return the same but wrong 
result, this will pass ..
   
   (looking at `to_pybytes` implementation, I would actually expect this to not 
work properly)



##########
python/pyarrow/tests/test_io.py:
##########
@@ -669,6 +669,57 @@ def test_allocate_buffer_resizable():
     assert buf.size == 200
 
 
+def test_non_cpu_buffer(pickle_module):
+    cuda = pytest.importorskip("pyarrow.cuda")
+    ctx = cuda.Context(0)
+
+    arr = np.arange(4, dtype=np.int32)
+    cuda_buf = ctx.buffer_from_data(arr)
+
+    arr = pa.Array.from_buffers(pa.int32(), 4, [None, cuda_buf])
+    buf_on_gpu = arr.buffers()[1]
+
+    assert buf_on_gpu.size == cuda_buf.size
+    assert buf_on_gpu.address == cuda_buf.address
+    assert buf_on_gpu.is_cpu == cuda_buf.is_cpu
+
+    repr1 = "<pyarrow.Buffer address="
+    repr2 = "size=16 is_cpu=False is_mutable=True>"
+    assert repr1 in repr(buf_on_gpu)
+    assert repr2 in repr(buf_on_gpu)
+
+    msg = "Implemented only for data on CPU device"
+    with pytest.raises(NotImplementedError, match=msg):
+        buf_on_gpu.hex()
+
+    with pytest.raises(NotImplementedError, match=msg):
+        cuda_buf.hex()
+
+    assert buf_on_gpu.is_mutable
+
+    with pytest.raises(NotImplementedError, match=msg):
+        buf_on_gpu[1]
+
+    with pytest.raises(NotImplementedError, match=msg):
+        cuda_buf[1]
+
+    with pytest.raises(NotImplementedError, match=msg):
+        pickle_module.dumps(buf_on_gpu, protocol=4)

Review Comment:
   Similarly can you again add a version testing this with `cuda_buf` as well?



##########
python/pyarrow/tests/test_io.py:
##########
@@ -669,6 +669,57 @@ def test_allocate_buffer_resizable():
     assert buf.size == 200
 
 
+def test_non_cpu_buffer(pickle_module):
+    cuda = pytest.importorskip("pyarrow.cuda")
+    ctx = cuda.Context(0)
+
+    arr = np.arange(4, dtype=np.int32)
+    cuda_buf = ctx.buffer_from_data(arr)
+
+    arr = pa.Array.from_buffers(pa.int32(), 4, [None, cuda_buf])
+    buf_on_gpu = arr.buffers()[1]
+
+    assert buf_on_gpu.size == cuda_buf.size
+    assert buf_on_gpu.address == cuda_buf.address
+    assert buf_on_gpu.is_cpu == cuda_buf.is_cpu
+
+    repr1 = "<pyarrow.Buffer address="
+    repr2 = "size=16 is_cpu=False is_mutable=True>"
+    assert repr1 in repr(buf_on_gpu)
+    assert repr2 in repr(buf_on_gpu)
+
+    msg = "Implemented only for data on CPU device"
+    with pytest.raises(NotImplementedError, match=msg):
+        buf_on_gpu.hex()
+
+    with pytest.raises(NotImplementedError, match=msg):
+        cuda_buf.hex()
+
+    assert buf_on_gpu.is_mutable
+
+    with pytest.raises(NotImplementedError, match=msg):
+        buf_on_gpu[1]
+
+    with pytest.raises(NotImplementedError, match=msg):
+        cuda_buf[1]
+
+    with pytest.raises(NotImplementedError, match=msg):
+        pickle_module.dumps(buf_on_gpu, protocol=4)
+
+    buf = pa.py_buffer(b'testing')
+    arr = pa.FixedSizeBinaryArray.from_buffers(pa.binary(7), 1, [None, buf])
+    buf_on_gpu = arr.buffers()[1]
+    buf_on_gpu_sliced = buf_on_gpu.slice(2)
+
+    buf = pa.py_buffer(b'sting')
+    arr = pa.FixedSizeBinaryArray.from_buffers(pa.binary(5), 1, [None, buf])
+    buf_on_gpu_expected = arr.buffers()[1]
+
+    assert buf_on_gpu_sliced.equals(buf_on_gpu_expected)
+    assert buf.equals(buf_on_gpu_expected)

Review Comment:
   I find it surprising that this passes for checking equality with CPU and 
non-CPU buffer ..
   
   But can you also check `buf_on_gpu.equals(cuda_buf)` above? (so combining a 
Buffer and CudaBuffer object) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41662: [Python] Ensure Buffer methods don't crash with non-CPU data [arrow]

Reply via email to