u3Izx9ql7vW4 opened a new issue, #44121:
URL: https://github.com/apache/arrow/issues/44121

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I was looking around the internet for why Arrow's IPC is slow, and I came 
upon a [post on 
stackoverflow](https://stackoverflow.com/questions/73221409/why-pyarrow-ipc-so-slow)
 from 2022, and ran it on my machine. I was a bit amazed at the disparity. 
   
   The following script using Numpy below took ~0.02s
   ```Python
   import numpy as np
   import time
   import ctypes
   
   from multiprocessing import sharedctypes
   
   data = np.ones([1, 1, 544, 192], dtype=np.float32)
   
   capacity = 1000 * 1 * 544 * 192 * 10
   
   buffer = sharedctypes.RawArray(ctypes.c_uint8, capacity + 1)
   ndarray = np.ndarray((capacity,), dtype=np.uint8, buffer=buffer)
   
   cur_offset = 0
   
   t = time.time()
   for i in range(1000):
       data = np.frombuffer(data, dtype=np.uint8)
       data_size = data.shape[0]
       ndarray[cur_offset:data_size + cur_offset] = data
       cur_offset += data_size
   e = time.time()
   
   print(e - t)
   ```
   
   The script below ran for 0.8s. 
   ```Python
   import numpy as np
   import pyarrow as pa
   import time
   import os
   
   data = np.ones((1, 1, 544, 992), dtype=np.float32)
   
   tensor = pa.Tensor.from_numpy(data)
   
   path = os.path.join(str("./"), 'pyarrow-tensor-ipc-roundtrip')
   mmap = pa.create_memory_map(path, 5000000 * 1000)
   
   s = time.time()
   for i in range(1000):
       result = pa.ipc.write_tensor(tensor, mmap)
   e = time.time()
   
   print(e - s)
   
   output_stream = pa.BufferOutputStream()
   
   s = time.time()
   for i in range(1000):
       result = pa.ipc.write_tensor(tensor, output_stream)
   e = time.time()
   
   print(e - s)
   ```
   Surprisingly `BufferOutputStream` is 2x slower than `create_memory_map`.  I 
also tried replacing the path with `/dev/shm/`, which is actual memory map 
directory, which sped things up to 0.6s. It seems like `create_memory_map` 
isn't using memory mapping at all. In fact if you swap out
   ```
   mmap = pa.create_memory_map(path, 5000000 * 1000)
   ```
   with
   ```
   mmap = pa.OSFile(path, 'wb')
   ```
   you'll decrease the write time by half! What's going on?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to