Dimitris Lekkas created ARROW-5069:
--------------------------------------
Summary: Implement direct support for shared memory arrow columns
Key: ARROW-5069
URL: https://issues.apache.org/jira/browse/ARROW-5069
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Environment: Linux
Reporter: Dimitris Lekkas
Fix For: 0.14.0
I consider the option of memory-mapping columns to shared memory to be
valuable. Such option will be triggered if specific metadata are supplied.
Given that many data frames backed by arrow are used for machine learning I
guess we could somehow benefit from treating differently the data (most likely
data buffer columns) that will be fed into the GPUs/FPGAs. To enable such
change we would need to address the following issues:
First, we need each column to hold an integer value representing its associated
file descriptor. This field has meaning only in the context of same process and
should not be transmitted when performing IPC. Instead, the application
developer could retrieve the file-name from the file descriptor (i.e fstat
syscall) and inform another application to reference that file or inform an
FPGA to DMA that memory-area.
We also need to support variable buffer alignment (restricted to powers-of-2 of
course) when initiating an arrow::AllocateBuffer() call. By inspecting the
current implementation, the alignment size is fixed at 64 bytes and to change
that value a recompilation is required [1].
To justify the above suggestion, major FPGA vendors (i.e Xilinx) benefit
heavily from page-aligned buffers since their device memory is 4KB [2].
Particularly, Xilinx warns users if they attempt to memcpy a non-page-aligned
buffer from CPU memory to FPGA's memory [3].
I am open to discuss any suggestions, improvements or concerns.
[1]:
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.cc#L40]
[2]:
[https://forums.xilinx.com/t5/SDAccel/memory-alignment-when-allocating-emmory-in-SDAccel/td-p/887593]
[3]: [https://forums.aws.amazon.com/thread.jspa?messageID=884615&tstart=0]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)