cyb70289 commented on issue #13787: URL: https://github.com/apache/arrow/issues/13787#issuecomment-1207606511
Looks the two cases are tested against different data size? [1, 1, 544, **192**] vs. [1, 1, 544, **992**] Besides, for the first test cast, I believe below line will realize all the physical pages. `buffer = sharedctypes.RawArray(ctypes.c_uint8, capacity + 1)` So the benchmarked code loop won't cause any page fault. But for pyarrow case, below line only reserves pages without truly allocate anyone. `mmap = pa.create_memory_map(path, 5000000 * 1000)` So the benchmarked code loop will trigger tons of page faults. I benchmarked the running time of the whole program, with same data size, and tempfile under 'dev/shm', pyarrow(0.459s) is faster than sharedctypes(0.947s). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
