[jira] [Commented] (ARROW-18115) [C++] Acero buffer alignment

Antoine Pitrou (Jira) Mon, 24 Oct 2022 14:03:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-18115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623420#comment-17623420
 ]


Antoine Pitrou commented on ARROW-18115:
----------------------------------------

IIRC there isn't much contention on the PR (at least the fundamentals). It's 
just that review bandwidth is scarce, and the release was higher priority.

> [C++] Acero buffer alignment
> ----------------------------
>
>                 Key: ARROW-18115
>                 URL: https://issues.apache.org/jira/browse/ARROW-18115
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> This is a general JIRA to centralize some discussion / proposal on a buffer 
> alignment strategy for Acero based on discussions that have happened in a few 
> different contexts.  Any actual work will probably span multiple JIRAs, some 
> of which are already in progress.
> Motivations:
> * Incoming data may not be aligned at all.  Some kernel functions and parts 
> of Acero (e.g. aggregation, join) use explicit SIMD instructions (e.g. 
> intrinsics) and will fail / cause corruption if care is not taken to use 
> unaligned loads (e.g. ARROW-17783).  There are parts of the code that assume 
> fixed arrays with size T are at least T aligned.  This is generally a safe 
> assumption for data generated by arrow-c++ (which allocates all buffers with 
> 64 byte alignment) but then leads to errors when processing data from flight.
> * Dataset writes and mid-plan spilling both can benefit form direct I/O, less 
> for performance reasons and more for memory management reasons.  However, in 
> order to use direct I/O a buffer needs to be aligned, typically to 512 bytes. 
>  This is larger than the current minimum alignment requirements.
> Proposal:
>  * Allow the minimum alignment of a memory pool to be configurable.  This is 
> similar to the proposal of ARROW-17836 but does not require much of an API 
> change (at the expense of being slightly less flexible).
>  * Add a capability to realign a buffer/array/batch/table to a target 
> alignment.  This would only modify buffers that are not already aligned.  
> Basically, given an arrow object, and a target memory pool, migrate a buffer 
> to the target memory pool if its alignment is insufficient.
>  * Acero, in the source node, forces all buffers to be 64 byte aligned.  This 
> way the internals of Acero don't have to worry about this case.  This 
> introduces a performance penalty when buffers are not aligned but is much 
> simpler to maintain and test than trying to support any random buffer.  To 
> avoid this penalty it would be simpler to avoid the unaligned buffers in the 
> first place.
>  * Acero requires a memory pool that has 512-byte alignment so that 
> Acero-allocated buffers can use direct I/O.  If the default memory pool does 
> not have 512-byte alignment then Acero can use a per-plan pool.  This covers 
> the common case for spilling and dataset writing which is that we are 
> partitioning prior to the write and so we are writing Acero-allocated buffers 
> anyways.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-18115) [C++] Acero buffer alignment

Reply via email to