u3Izx9ql7vW4 opened a new issue, #44121: URL: https://github.com/apache/arrow/issues/44121
### Describe the bug, including details regarding any error messages, version, and platform. I was looking around the internet for why Arrow's IPC is slow, and I came upon a [post on stackoverflow](https://stackoverflow.com/questions/73221409/why-pyarrow-ipc-so-slow) from 2022, and ran it on my machine. I was a bit amazed at the disparity. The following script using Numpy below took ~0.02s ```Python import numpy as np import time import ctypes from multiprocessing import sharedctypes data = np.ones([1, 1, 544, 192], dtype=np.float32) capacity = 1000 * 1 * 544 * 192 * 10 buffer = sharedctypes.RawArray(ctypes.c_uint8, capacity + 1) ndarray = np.ndarray((capacity,), dtype=np.uint8, buffer=buffer) cur_offset = 0 t = time.time() for i in range(1000): data = np.frombuffer(data, dtype=np.uint8) data_size = data.shape[0] ndarray[cur_offset:data_size + cur_offset] = data cur_offset += data_size e = time.time() print(e - t) ``` The script below ran for 0.8s. ```Python import numpy as np import pyarrow as pa import time import os data = np.ones((1, 1, 544, 992), dtype=np.float32) tensor = pa.Tensor.from_numpy(data) path = os.path.join(str("./"), 'pyarrow-tensor-ipc-roundtrip') mmap = pa.create_memory_map(path, 5000000 * 1000) s = time.time() for i in range(1000): result = pa.ipc.write_tensor(tensor, mmap) e = time.time() print(e - s) output_stream = pa.BufferOutputStream() s = time.time() for i in range(1000): result = pa.ipc.write_tensor(tensor, output_stream) e = time.time() print(e - s) ``` Surprisingly `BufferOutputStream` is 2x slower than `create_memory_map`. I also tried replacing the path with `/dev/shm/`, which is actual memory map directory, which sped things up to 0.6s. It seems like `create_memory_map` isn't using memory mapping at all. In fact if you swap out ``` mmap = pa.create_memory_map(path, 5000000 * 1000) ``` with ``` mmap = pa.OSFile(path, 'wb') ``` you'll decrease the write time by half! What's going on? ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org