[jira] [Commented] (ARROW-5069) [C++] Implement direct support for shared memory arrow columns

Wes McKinney (JIRA) Wed, 22 May 2019 08:08:51 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845970#comment-16845970
 ]


Wes McKinney commented on ARROW-5069:
-------------------------------------

[~dimlek] It seems like you would need to draft a more detailed proposal 
document to go into detail about how things should ideally work. The Arrow data 
structures can reference the memory from any {{Buffer}} subclass, and we 
already have examples of referencing shared memory and GPU memory. So all of 
the machinery is built already. The question becomes what kind of API can yield 
shared-memory data structures. I'm interested to see what you propose

> [C++] Implement direct support for shared memory arrow columns
> --------------------------------------------------------------
>
>                 Key: ARROW-5069
>                 URL: https://issues.apache.org/jira/browse/ARROW-5069
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>         Environment: Linux
>            Reporter: Dimitris Lekkas
>            Priority: Major
>              Labels: perfomance, proposal
>
> I consider the option of memory-mapping columns to shared memory to be 
> valuable. Such option will be triggered if specific metadata are supplied. 
> Given that many data frames backed by arrow are used for machine learning I 
> guess we could somehow benefit from treating differently the data (most 
> likely data buffer columns) that will be fed into the GPUs/FPGAs. To enable 
> such change we would need to address the following issues:
> First, we need each column to hold an integer value representing its 
> associated file descriptor. The application developer could retrieve the 
> file-name from the file descriptor (i.e fstat syscall) and inform another 
> application to reference that file or inform an FPGA to DMA that memory-area.
> We also need to support variable buffer alignment (restricted to powers-of-2 
> of course)  when initiating an arrow::AllocateBuffer() call. By inspecting 
> the current implementation, the alignment size is fixed at 64 bytes and to 
> change that value a recompilation is required [1].
> To justify the above suggestion, major FPGA vendors (i.e Xilinx) benefit 
> heavily from page-aligned buffers since their device memory is 4KB [2]. 
> Particularly, Xilinx warns users if they attempt to memcpy a non-page-aligned 
> buffer from CPU memory to FPGA's memory [3]. 
> Wouldn't it be nice if we could issue from_pandas() and then have our columns 
> memory mapped to shared memory for FPGAs to DMA such memory and accelerate 
> the workload? If there is already a workaround to achieve that I would like 
> more info on that.
> I am open to discuss any suggestions, improvements or concerns. 
>  
> [1]: 
> [https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.cc#L40]
> [2]: 
> [https://forums.xilinx.com/t5/SDAccel/memory-alignment-when-allocating-emmory-in-SDAccel/td-p/887593]
> [3]: [https://forums.aws.amazon.com/thread.jspa?messageID=884615&tstart=0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5069) [C++] Implement direct support for shared memory arrow columns

Reply via email to