qiuyang163 opened a new issue, #13787:
URL: https://github.com/apache/arrow/issues/13787
I want to send a numpy from one process to another process,then I use python
function
```
import numpy as np
import time
import ctypes
from multiprocessing import sharedctypes
data = np.ones([1, 1, 544, 192], dtype=np.float32)
capacity = 1000 * 1 * 544 * 192 * 10
buffer = sharedctypes.RawArray(ctypes.c_uint8, capacity + 1)
ndarray = np.ndarray((capacity,), dtype=np.uint8, buffer=buffer)
cur_offset = 0
t = time.time()
for i in range(1000):
data = np.frombuffer(data, dtype=np.uint8)
data_size = data.shape[0]
ndarray[cur_offset:data_size + cur_offset] = data
cur_offset += data_size
e = time.time()
print(e - t)
```
get time cost 0.09337258338928223s
then I use pyarrow
```
import numpy as np
import pyarrow as pa
import time
import os
data = np.ones((1, 1, 544, 992), dtype=np.float32)
tensor = pa.Tensor.from_numpy(data)
path = os.path.join(str("./"), 'pyarrow-tensor-ipc-roundtrip')
mmap = pa.create_memory_map(path, 5000000 * 1000)
s = time.time()
for i in range(1000):
result = pa.ipc.write_tensor(tensor, mmap)
e = time.time()
print(e - s)
output_stream = pa.BufferOutputStream()
s = time.time()
for i in range(1000):
result = pa.ipc.write_tensor(tensor, output_stream)
e = time.time()
print(e - s)
```
and get results: 1.8259341716766357s 3.7164011001586914s
why pyarrow ipc is so slow?Does it use share memory to communicate between
different processes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]