qiuyang163 commented on issue #13787:
URL: https://github.com/apache/arrow/issues/13787#issuecomment-1207709334
> ```python
> import numpy as np
> import pyarrow as pa
> import time
> import os
>
> data = np.ones((1, 1, 544, 192), dtype=np.float32)
>
> tensor = pa.Tensor.from_numpy(data)
>
> path = os.path.join(str("/dev/shm/"), 'pyarrow-tensor-ipc-roundtrip')
> mmap = pa.create_memory_map(path, 5000000 * 1000)
>
> s = time.time()
> for i in range(1000):
> result = pa.ipc.write_tensor(tensor, mmap)
> e = time.time()
>
> print(e - s)
>
> #output_stream = pa.BufferOutputStream()
> #
> #s = time.time()
> #for i in range(1000):
> # result = pa.ipc.write_tensor(tensor, output_stream)
> #e = time.time()
> #
> #print(e - s)
> ```
sorry,I havent't understand.In your opinion, `mmap =
pa.create_memory_map(path, 5000000 * 1000)` this line only reserves pages
without truly allocate anyone and the code loop will cause some page fault. why
pyarrow(0.459s) is faster than sharedctypes(0.947s)?
I modified my test code to the same data size
```
import numpy as np
import pyarrow as pa
import time
import os
data = np.ones((1, 1, 544, 192), dtype=np.float32)
tensor = pa.Tensor.from_numpy(data)
path = os.path.join(str("/dev/shm/"), 'pyarrow-tensor-ipc-roundtrip')
mmap = pa.create_memory_map(path, 5000000 * 1000)
s = time.time()
for i in range(1000):
result = pa.ipc.write_tensor(tensor, mmap)
e = time.time()
print(e - s)
```
pyarrow :0.23s
```
import numpy as np
import time
import ctypes
from multiprocessing import sharedctypes
data = np.ones([1, 1, 544, 192], dtype=np.float32)
capacity = 1000 * 1 * 544 * 192 * 4
buffer = sharedctypes.RawArray(ctypes.c_uint8, capacity + 1)
ndarray = np.ndarray((capacity,), dtype=np.uint8, buffer=buffer)
cur_offset = 0
t = time.time()
for i in range(1000):
data = np.frombuffer(data, dtype=np.uint8)
data_size = data.shape[0]
ndarray[cur_offset:data_size + cur_offset] = data
cur_offset += data_size
e = time.time()
print(e - t)
```
sharedctypes cost 0.10s
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]