[GitHub] [arrow] qiuyang163 opened a new issue, #13787: why pyarrow ipc so slow?

GitBox Wed, 03 Aug 2022 05:11:30 -0700


qiuyang163 opened a new issue, #13787:
URL: https://github.com/apache/arrow/issues/13787


   I want to send a numpy from one process to another process,then I use python 
function
   ```
   import numpy as np
   import time
   import ctypes
   
   from multiprocessing import sharedctypes
   
   data = np.ones([1, 1, 544, 192], dtype=np.float32)
   
   capacity = 1000 * 1 * 544 * 192 * 10
   
   buffer = sharedctypes.RawArray(ctypes.c_uint8, capacity + 1)
   ndarray = np.ndarray((capacity,), dtype=np.uint8, buffer=buffer)
   
   cur_offset = 0
   
   t = time.time()
   for i in range(1000):
       data = np.frombuffer(data, dtype=np.uint8)
       data_size = data.shape[0]
       ndarray[cur_offset:data_size + cur_offset] = data
       cur_offset += data_size
   e = time.time()
   
   print(e - t)
   ```
   
   get time cost 0.09337258338928223s
   
   then I use pyarrow
   
   ```
   import numpy as np
   import pyarrow as pa
   import time
   import os
   
   data = np.ones((1, 1, 544, 992), dtype=np.float32)
   
   tensor = pa.Tensor.from_numpy(data)
   
   path = os.path.join(str("./"), 'pyarrow-tensor-ipc-roundtrip')
   mmap = pa.create_memory_map(path, 5000000 * 1000)
   
   s = time.time()
   for i in range(1000):
       result = pa.ipc.write_tensor(tensor, mmap)
   e = time.time()
   
   print(e - s)
   
   output_stream = pa.BufferOutputStream()
   
   s = time.time()
   for i in range(1000):
       result = pa.ipc.write_tensor(tensor, output_stream)
   e = time.time()
   
   print(e - s)
   ```
   
   and get results: 1.8259341716766357s 3.7164011001586914s
   
   why pyarrow ipc is so slow?Does it use share memory to communicate between 
different processes?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] qiuyang163 opened a new issue, #13787: why pyarrow ipc so slow?

Reply via email to